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FILTRATED ALGEBRAIC SUBSPACE CLUSTERING 

MANOLIS C. TSAKIRIS AND RENE VIDAL t 


Abstract. Subspace clustering is the problem of clustering data that lie close to a union of 
linear subspaces. Existing algebraic subspace clustering methods are based on fitting the data with 
an algebraic variety and decomposing this variety into its constituent subspaces. Such methods are 
well suited to the case of a known number of subspaces of known and equal dimensions, where a 
single polynomial vanishing in the variety is sufficient to identify the subspaces. While subspaces 
of unknown and arbitrary dimensions can be handled using multiple vanishing polynomials, current 
approaches are not robust to corrupted data due to the difficulty of estimating the number of poly¬ 
nomials. As a consequence, the current practice is to use a single polynomial to fit the data with a 
union of hyperplanes containing the union of subspaces, an approach that works well only when the 
dimensions of the subspaces are high enough. In this paper, we propose a new algebraic subspace 
clustering algorithm, which can identify the subspace S passing through a point x by constructing 
a descending filtration of subspaces containing S. First, a single polynomial vanishing in the variety 
is identified and used to find a hyperplane containing S. After intersecting this hyperplane with 
the variety to obtain a sub-variety, a new polynomial vanishing in the sub-variety is found and so 
on until no non-trivial vanishing polynomial exists. In this case, our algorithm identifies S as the 
intersection of the hyperplanes identified thus far. By repeating this procedure for other points, 
our algorithm eventually identifies all the subspaces. Alternatively, by constructing a filtration at 
each data point and comparing any two filtrations using a suitable affinity, we propose a spectral 
version of our algebraic procedure based on spectral clustering, which is suitable for computations 
with noisy data. We show by experiments on synthetic and real data that the proposed algorithm 
outperforms state-of-the-art methods on several occasions, thus demonstrating the merit of the idea 
of filtrations^. 

Key words. Generalized Principal Component Analysis, Subspace Clustering, Algebraic Sub¬ 
space Clustering, Subspace Arrangements, Transversal Subspaces, Spectral Clustering 
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1. Introduction. Given a set of points lying close to a union of linear subspaces, 
subspace clustering refers to the problem of identifying the number of subspaces, their 
dimensions, a basis for each subspace, and the clustering of the data points accord¬ 
ing to their subspace membership. This is an important problem with widespread 
applications in computer vision [38], systems theory [24] and genomics [17]. 

1.1. Existing work. Over the past 15 years, various subspace clustering meth¬ 
ods have appeared in the literature [36]. Early techniques, such as K-subspaces [2, 34] 
or Mixtures of Probabilistic PC A [30, 13], rely on solving a non-convex optimization 
problem by alternating between assigning points to subspaces and re-estimating a 
subspace for each group of points. As such, these methods are sensitive to initializa¬ 
tion. Moreover, these methods require a-priori knowledge of the number of subspaces 
and their dimensions. This motivated the development of a family of purely alge¬ 
braic methods, such as Generalized Principal Component Analysis or GPCA [41], 
which feature closed form solutions for various subspace configurations, such as hy¬ 
perplanes [40, 39]. A little later, ideas from spectral clustering [44] led to a family of 
algorithms based on constructing an affinity between pairs of points. Some methods 
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utilize local geometric information to construct the affinities [47]. Such methods can 
estimate the dimension of the subspaces, but cannot handle data near the intersec¬ 
tions. Other methods use global geometric information to construct the affinities, 
such as the spectral curvature [3]. Such methods can handle intersecting subspaces, 
but require the subspaces to be low-dimensional and of equal dimensions. In the 
last five years, methods from sparse representation theory, such as Sparse Subspace 
Clustering [8, 9, 10], low-rank representation, such as Low-Rank Subspace Clustering 
[22, 11, 20, 37], and least-squares, such as Least-Squares-Regression Subspace Clus¬ 
tering [23], have provided new ways for constructing affinity matrices using convex 
optimization techniques. Among them, sparse-representation based methods have 
become extremely attractive because they have been shown to provide affinities with 
guarantees of correctness as long as the subspaces are sufficiently separated and the 
data are well distributed inside the subspaces [10, 28]. Moreover, they have also been 
shown to handle noise [45] and outliers [29]. However, existing results require the sub¬ 
space dimensions to be small compared to the dimension of the ambient space. This is 
in sharp contrast with algebraic methods, which can handle the case of hyperplanes. 

1.2. Motivation. This paper is motivated by the highly complementary proper¬ 
ties of Sparse Subspace Clustering (SSC) and Algebraic Subspace Clustering (ASC), 
priorly known as GPCA:^ On the one hand, theoretical results for SSC assume that 
the subspace dimensions are small compared to the dimension of the ambient space. 
Furthermore, SSC is known to be very robust in the presence of noise in the data. 
On the other hand, theoretical results for ASC are valid for subspaces of arbitrary 
dimensions, with the easiest case being that of hyperplanes, provided that an upper 
bound on the number of subspaces is known. However, all known implementations 
of ASC for subspaces of different dimensions, including the recursive algorithm pro¬ 
posed in [16], are very sensitive to noise and are thus considered impractical. As a 
consequence, our motivation for this work is to develop an algorithm that enjoys the 
strong theoretical guarantees associated to ASC, but it is also robust to noise. 

1.3. Paper contributions. This paper features two main contributions. 

As a first contribution, we propose a new ASC algorithm, called Filtrated Algebraic 
Subspace Clustering (FASC), which can handle an unknown nnmber of subspaces of 
possibly high and different dimensions, and give a rigorous proof of its correctness.^ 
Our algorithm solves the following problem: 

Definition 1 (Algebraic subspace clustering problem). Given a finite set of 
points X = {a^i,. •. ,aiAr} lying in general position^ inside a transversal subspace ar¬ 
rangement^ A = Ur=i *^0 decompose A into its irreducible components, i.e., find the 
number of subspaces n and a basis for each subspace Si,i = 1,... ,n. 

Our algorithm approaches this problem by selecting a snitable polynomial van¬ 
ishing on the subspace arrangement A. The gradient of this polynomial at a point 
X ^ A gives the normal vector to a hyperplane Vi containing the subspace S passing 
through the point. By intersecting the subspace arrangement with the hyperplane, 

^Following the convention introduced in [42], we have taken the liberty to change the name from 
GPCA to ASC for two reasons. First, to have a consistent naming convention across many subspace 
clustering algorithms, such as ASC, SSC, LRSC, which is indicative of the its type (algebraic, sparse, 
low-rank). Second, we believe that CPCA is a more general name that is best suited for the entire 
family of subspace clustering algorithms, which are all generalizations of PCA. 

® Partial results from the present paper have been presented without proofs in [32]. 

"^We will define formally the notion of points in general position in Definition 12. 

®We will define formally the notion of a transversal subspace arrangement in Definition 4. 
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we obtain a subspace sub-arrangement Ai C A, which lives in an ambient space Vi 
of dimension one less than the original ambient dimension and still contains <S. By 
choosing another suitable polynomial that vanishes on Ai, computing the gradient of 
this new polynomial at the same point, intersecting again with the new hyperplane V 2 , 
and so on, we obtain a descending filtration Vi D V 2 D • ■ • D 5 of subspace arrange¬ 
ments, which eventually gives us the subspace S containing the point. This happens 
precisely after c steps, where c is the codimension of S, when no non-trivial vanishing 
polynomial exists, and the ambient space Vc, which is the orthogonal complement 
of the span of all the gradients used in the filtration, can be identified with <S. By 
repeating this procedure at another point not in the first subspace, we can identify 
the second subspace and so on, until all subspaces have been identified. Using results 
from algebraic geometry, we rigorously prove that this algorithm correctly identifies 
the number of subspaces, their dimensions and a basis for each subspace. 

As a second contribution, we extend the ideas behind the purely abstract FASC al¬ 
gorithm to a working algorithm called Filtrated Spectral Algebraic Subspace Clustering 
(FSASC), which is suitable for computations with noisy data.® The first modification 
is that intersections with hyperplanes are replaced by projections onto them. In this 
way, points in the subspace contained by the hyperplane are preserved by the pro¬ 
jection, while other points are generally shrank. The second modification is that we 
compute a filtration at each data point and use the norm of point Xj at the end of the 
filtration associated to point Xi to define an affinity between these two points. The 
intuition is that the filtration associated to point Xi will in theory preserve the norms 
of all points lying in the same subspace as Xi. This process leads to an affinity matrix 
of high intra-class and low cross-class connectivity, upon which spectral clustering is 
applied to obtain the clustering of the data. By experiments on real and synthetic 
data we demonstrate that the idea of filtrations leads to affinity matrices of superior 
quality, i.e., affinities with high intra- and low inter-cluster connectivity, and as a 
result to better clustering accuracy. In particular, FSASC is shown to be superior to 
state-of-the-art methods in the problem of motion segmentation using the HopkinsISS 
dataset [31]. 

Finally, we have taken the liberty of presenting in an appendix the foundations 
of the algebraic geometric theory of subspace arrangements relevant to Algebraic 
Subspace Clustering, in a manner that is both rigorous and accessible to the interested 
audience outside the algebraic geometry community, thus complementing existing 
reviews such as [25]. 

1.4. Notation. For any positive integer n, we define [n] := {l,2,...,n}. We 
denote by K the real numbers. The right null space of a matrix B is denoted by M{B). 
If <S is a subspace of then dim(5) denotes the dimension of S and its ■ t S 
is the orthogonal projection of R^ onto S. The symbol © denotes direct sum of 
subspaces. We denote the orthogonal complement of a subspace S in R^ by 5^. 
If are elements of R^, we denote by Span(yi,..., the subspace of 

R^ spanned by these elements. For two vectors a:,y € R^, the notation x = y 
means that x and y are colinear. We let R[a;] = R[a;i,..., a:/?] be the polynomial 
ring over the real numbers in D indeterminates. We use x to denote the vector of 
indeterminates x = (xi,..., xu), while we reserve x to denote a data point x = 
(Xi) • ■ • Ad) of R^. We denote by R[x]^ the set of all homogeneous^ polynomials 


®A preliminary description of this method appeared in a workshop paper [33]. 

polynomial in many variables is called homogeneous if all monomials appearing in the poly¬ 
nomial have the same degree. 
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of degree i and similarly ]R[a;]<^ the set of all homogeneous polynomials of degree 
less than or equal to i. ]R[a;] is an infinite dimensional real vector space, while 
and ]R[a;]<f are finite dimensional subspaces of M[a;] of dimensions A4i{D) := 
and respectively. We denote by R(x) the field of all rational functions over 

R and indeterminates xi,... ,xd- If {pi, ■ • ■ jPs} is a subset of R[x], we denote by 
(pi,... ,ps) the ideal generated by pi,... ,Ps (see Definition 29). If ^ is a subset of 

we denote by I4 the vanishing ideal of i.e., the set of all elements of M[x] that 
vanish on A and similarly lA,e ■= H R[x]£ and lA,<e ■= Ia H R[a:]<f. Finally, for 
a point X G R^, and a set I C R[x] of polynomials, VX\x is the set of gradients of all 
the elements of I evaluated at x. 

1.5. Paper organization. The remainder of the paper is organized as follows: 
section 2 provides a careful, yet concise review of the state-of-the-art in algebraic 
subspace clustering. In section 3 we discuss the FASC algorithm from a geometric 
viewpoint with as few technicalities as possible. Throughout Sections 2 and 3, we 
use a running example of two lines and a plane in R^ to illustrate various ideas; the 
reader is encouraged to follow these illustrations. We save the rigorous treatment of 
FASC for section 4, which consists of the technical heart of the paper. In particular, 
the listing of the FASC algorithm can be found in Algorithm 3 and the theorem 
establishing its correctness is Theorem 28. In section 5 we describe FSASC, which is 
the numerical adaptation of FASC, and compare it to other state-of-the-art subspace 
clustering algorithms using both synthetic and real data. Finally, appendices A, B 
and C cover basic notions and results from commutative algebra, algebraic geometry 
and subspace arrangements respectively, mainly used throughout section 4. 

2. Review of Algebraic Subspace Clustering (ASC). This section reviews 
the main ideas behind ASC. For the sake of simplicity, we first discuss ASC in the case 
of hyperplanes (section 2.1) and subspaces of equal dimension (section 2.2), for which 
a closed form solution can be found using a single polynomial. In the case of subspaces 
of arbitrary dimensions, the picture becomes more involved, but a closed form solution 
from multiple polynomials is still available when the number of subspaces n is known 
(section 2.3) or an upper bound m for n is known (section 2.4). In section 2.5 we 
discuss one limitation of ASC due to computational complexity and a partial solution 
based on a recursive ASC algorithm. In section 2.6 we discuss another limitation of 
ASC due to sensitivity to noise and a practical solution based on spectral clustering. 
We conclude in section 2.7 with the main challenge that this paper aims to address. 

2.1. Subspaces of codimension 1. The basic principles of ASC can be intro¬ 
duced more smoothly by considering the case where the union of subspaces is the 
union of n hyperplanes A = Ur=i Each hyperplane Hi is uniquely defined 

by its unit length normal vector bi G R"^ as Hi = {x G R^ : bjx = 0}. In the lan¬ 
guage of algebraic geometry this is equivalent to saying that Hi is the zero set of the 
polynomial bjx or equivalently Hi is the algebraic variety defined by the polynomial 
equation bJx = 0, where bJx = bi^Xi -!-■•• hi^oXo with bi := (6 i,i,..., bi^u)^,x := 
(xi,... ,xd)^■ We write this more succinctly as Hi = Z{bJx). We then observe 
that a point x of belongs to Ufci and only if a: is a root of the polyno¬ 
mial p{x) = {bJx) ■ ■ ■ (b^x), i.e., the union of hyperplanes A is the algebraic variety 
A = Z[p) (the zero set of p). Notice the important fact that p is homogeneous of 
degree equal to the number n of distinct hyperplanes and moreover it is the product 
of linear homogeneous polynomials bJx, i.e., a product of linear forms, each of which 
defines a distinct hyperplane Hi via the corresponding normal vector bi. 
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Given a set of points X = C ^ in general position in the union of hyper¬ 

planes, the classic polynomial differentiation algorithm proposed in [39, 41] recovers 
the correct number of hyperplanes as well as their normal vectors by 

1 . embedding the data into a higher-dimensional space via a polynomial map, 

2. finding the number of subspaces by analyzing the rank of the embedded data 
matrix, 

3. finding the polynomial p from the null space of the embedded data matrix, 

4. finding the hyperplane normal vectors from the derivatives of p at a nonsin¬ 
gular point X of .4.® 

More specifically, observe that the polynomial p{x) = (bjx) ■ ■ ■ (b^x) can be writ¬ 
ten as a linear combination of the set of all monomials of degree n in D variables, 
{x'f,x'l~^X 2 ,x'f~^X 2 ... .,xix'^jf~^,.. ■,x'f,} as: 

(1) P{x) = ^ Cn^,n 2 ,...,nDxTxT ■ ■ ■xff' = l’n[x). 

n\-\-n2-\ — riD—n 

In the above expression, c e is the vector of all coefficients Cnj,n 2 ,...,nD^ ^tnd Vn 

is the Veronese or Polynomial embedding of degree n, as it is known in the algebraic 
geometry and machine learning literature, respectively. It is defined by taking a point 
of to a point of under the rule 

( 2 ) {xi,.. .,xdV ^ {x'f,X^~^X2,x’f~^X3 .. .,xix’]j~^,.. .,xf))^ , 

where A4n{D) is the dimension of the space of homogeneous polynomials of degree n 
in D indeterminates. The image of the data set X under the Veronese embedding is 
used to form the so-called embedded data matrix 

(3) vi{X) := [vi{xi) ■■■ vi{xn)Y ■ 

It is shown in [41] that when there are sufficiently many data points that are sufficiently 
well distributed in the subspaces, the correct number of hyperplanes is the smallest 
degree i for which vi(X') drops rank by 1: n = min^>i{t': rank(i/^(d:’)) = Mi{D) — 1}. 
Moreover, it is shown in [41] that the polynomial vector of coefficients c is the unique 
up to scale vector in the one-dimensional null space of Uni^X). 

It follows that the task of identifying the normals to the hyperplanes from p is 
equivalent to extracting the linear factors of p. This is achieved® by observing that if 
we have a point x , then the gradient Vp|,c of p evaluated at x 


(4) VpU = Y,b,\{{b].x) 

1=1 IVI 

is equal to bi up to a scale factor because bj x = 0 and hence all the terms in the sum 
vanish except for the (see Proposition 56 for a more general statement). Having 
identified the normal vectors, the task of clustering the points in X is straightforward. 


®A nonsingular point of a subspace arrangement is a point that lies in one and only one of the 
subspaces that constitute the arrangement. 

direct factorization has been shown to be possible as well [40]; however this approach has not 
been generalized yet to the case of subspaces of different dimensions. 
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2.2. Subspaces of equal dimension. Let us now consider a more general case, 

where we know that the subspaces are of equal and known dimension d. Such a case 
can be reduced to the case of hyperplanes, by noticing that a union of n subspaces of 
dimension d of becomes a union of hyperplanes of after a generic projection 
TTd : —>• We note that any random orthogonal projection will almost surely 

preserve the number of subspaces and their dimensions, as the set of projections 
TTd that do not have this preserving property is a zero measure subset of the set of 
orthogonal projections {tt^ G ^ 

When the common dimension d is unknown, it can be estimated exactly by an¬ 
alyzing the right null space of the embedded data matrix, after projecting the data 
generically onto subspaces of dimension d' -|- 1, with d' = £> — 1,£> — 2,... [35]. More 
specifically, when d' > d, we have that dimA/’(^'„( 7 rd'(A’))) > 1 , while when d' < d we 
have d\T[vJ\f{vn{TTd'{X))) = 0. On the other hand, the case d' = die the only case for 
which the null space is one-dimensional, and so d = {d' : {vn{TTd'{X))) = !}• 

Finally, when both n and d are unknown, one can first recover d as the smallest d' 
such that there exists an i for which dm\H{i't{TTd'[X))) > 0, and subsequently recover 
n as the smallest £ such that A\n\H[vt{'Kd{X))) > 0; see [35] for further details. 

2.3. Known number of subspaces of arbitrary dimensions. When the 
dimensions of the subspaces are unknown and arbitrary, the problem becomes much 
more complicated, even if the number n of subspaces is known, which is the case 
examined in this subsection. In such a case, a union of subspaces .A = 5i U • • • U <S„ 
of henceforth called a subspace arrangement, is still an algebraic variety. The 
main difference with the case of hyperplanes is that, in general, multiple polynomials 
of degree n are needed to define A, i.e., A is the zero set of a finite collection of 
homogeneous polynomials of degree n in D indeterminates. 

Example 2. Consider the union A of a plane Si and two lines 52,53 in general 
position in (Fig. 1). Then .A = 5i U 52 U 53 is the zero set of the degreeA 



Fig. 1. A union of two lines and one plane in general position in R®. 

homogeneous polynomials 

(5) Pi := {blx){bl.^x)(bl.yx), p2 := {bjx){bJj^x){bJ.^x), 

( 6 ) P3 := {bjx){bj2x){bjix), p4 ■■= {bjx){b2^2x){bj2x), 

where bi is the normal vector to the plane Si and bij, j = 1 , 2 , are two linearly 
independent vectors that are orthogonal to the line Si,i = 2,3. These polynomials 
are linearly independent and form a basis for the vector space Iyi ,3 of the degree-3 
homogeneous polynomials that vanish on .A.^*^ 


interested reader is encouraged to prove this claim. 
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In contrast to the case of hyperplanes, when the subspace dimensions are different, 
there may exist vanishing polynomials of degree strictly less than the number of 
subspaces. 

Example 3 . Consider the setting of Example 2. Then there exists a unique up 
to scale vanishing polynomial of degree 2, which is the product of two linear forms: 
one form is bjx, where bi is the normal to the plane Si, and the other linear form 
is X, where f is the normal to the plane defined by the lines S 2 and S 3 (Fig. 2). 



«Si U <52 U tSs- 61 is the normal vector to plane <Si and f is the normal vector to the plane H23 
spanned by lines S2 dnd S3. 


As Example 2 shows, all the relevant geometric information is still encoded in 
the factors of some special basis^^ of lA,m that consists of degree-n homogeneous 
polynomials that factorize into the product of linear forms. However, computing such 
a basis remains, to the best of our knowledge, an unsolved problem. Instead, one can 
only rely on computing (or be given) a general basis for the vector space Ta.u- In our 
example such a basis could be 

(7) P1+P4, Pi-P4, P2+P3, P2-P3 

and it can be seen that none of these polynomials is factorizable into the product 
of linear forms. This difficulty was not present in the case of hyperplanes, because 
there was only one vanishing polynomial (up to scale) of degree n and it had to be 
factorizable. 

In spite of this difficulty, a solution can still be achieved in an elegant fashion 
by resorting to polynomial differentiation. The key fact that allows this approach 
is that any homogeneous polynomial p of degree n that vanishes on the subspace 
arrangement ^ is a linear combination of vanishing polynomials, each of which is a 
product of linear forms, with each distinct subspace contributing a vanishing linear 
form in every product (Theorem 58 ). As a consequence (Proposition 56 ), the gradient 
of p evaluated at some point x G Si — Ui'^iSi' lies in S(~ and the linear span of the 
gradients at x of all such p is precisely equal to S(~. We can thus recover Si, remove 
it from A and then repeat the procedure to identify all the remaining subspaces. 
As stated in Theorem 6, this process is provably correct as long as the subspace 
arrangement A is transversal, as defined next. 

Definition 4 (Transversal subspace arrangement [ 5 ]). A subspace arrangement 
-A = ULi5. C is called transversal, if for any subset 3 of [n], the codimension of 
zs the minimum between D and the sum of the codimensions of all Si, i G 3. 

Strictly speaking, this is not always true. However, it is true if the subspace arrangement is 
general enough, in particular if it is transversal; see Definition 4 and Theorem 58. 










Remark 5. Transversality is a geometric condition on the subspaces, which in 
particular requires the dimensions of all possible intersections among subspaces to be 
as small as the dimensions of the subspaces allow (see Appendix C for a discussion). 

Theorem 6 (ASC by polynomial differentiation when n is known, [41, 25]). Let 
-4=ur=i Si be a transversal subspace arrangement ofM.^, let x € Si ® 

nonsingular point in A, and let be the vector space of all degree-n homogeneous 
polynomials that vanish on A. Then Si is the orthogonal complement of the subspace 
spanned by all vectors of the form Vp\x, where p € TA,n, i-e., Si = Span {VlA,n\x) ■ 

Theorem 6 and its proof are illustrated in the next example. 

Example 7. Consider Example 2 and recall that pi — {bjx){bjix){bj^x), p 2 = 
{bjx){b 2 ^j^x){bj 2 x), P3 = {bjx){bj 2 x){bjj^x), and p^ = {plx){bl^ 2 x){bj 2 x). Let X 2 
be a generic point in <S 2 — 5i U ^3 . Then 

( 8 ) Vpi|a:2 = Vp2|a;2 - ^2,1, = Vp4|a:2 - ^ 2 , 2 - 

Hence b 2 ,ijb 2,2 S SpanCVI^^ala;^) and so S 2 D Span Conversely, let 

p G 2 ) 4 , 3 . Then there exist Ui e R, i = 1,... , 4, such that p = ^iPi 

4 

(9) Vp|a :2 = ^ ai\7pi\x2 e Span(b 2 ,i, 62 , 2 ) = S)-. 

Hence VX^, 3 |a ,2 C 5^, and so Span(VX^, 3 |a; 2 )'‘- D ^ 2 . 

2.4. Unknown number of subspaces of arbitrary dimensions. As it turns 
out, when the number of subspaces n is unknown, but an upper bound m > n is given, 
one can obtain the decomposition of the subspace arrangement from the gradients of 
the vanishing polynomials of degree to, precisely as in Theorem 6 , simply by replacing 
n with TO. 

Theorem 8 (ASC by polynomial differentiation when an upper bound on n is 
known, [41, 25]). Let A = Ufci be a transversal subspace arrangement ofR^, let 
X G Si — Uii^iSi' be a nonsingular point in A, and let lA,m be the vector space of 
all degree-m homogeneous polynomials that vanish on A, where m > n. Then Si is 
the orthogonal complement of the subspace spanned by all vectors of the form Vpja,, 
where p G lA,m, i-e., S^ = Span (VX, 4 ,m|cc) • 

Example 9. Consider the setting of Examples 2 and 3. Suppose that we have the 
upper bound m = A on the number of underlying subspaces (n = 3). It can be shown 
that the vector space Ia,a has^^ dimension 8 and is spanned by the polynomials 


( 10 ) 

9i 

= ibjx){f~^xf, 

95 

= ibjx){f^x){bjxf, 

( 11 ) 

92 

= {bjxf{rxr 

96 

= ibjx){bjxf{f^x), 

( 12 ) 

93 

= (bjxfif^x), 

97 

= ibjx){bjxf{bjx), 

(13) 

94 

= {bjx){f^xfibjx), 

98 

= ibjx){bjx){bjxf, 


where 61 is the normal to Si, f is the normal to the plane defined by lines S 2 and 
S 3 , and bi is a normal to line Si that is linearly independent from f, for i = 2,3. 


^^This can be verified by applying the dimension formula of Corollary 3.4 in [5]. 
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Hence iSi = Span(bi)-*- and Si = Span(/,6^)-*-,i = 2,3. Then for a generic point 
X 2 € S 2 — Si U S 3 , we have that 


(14) 

(15) 


^Qi\x2 ^Q2\x2 ^^ 41*2 ^Qc\x2 ^^7\x2 0 , 

^qz\x2'^^qb\x2 = f, ^q%\x2=b2- 


Hence f,b 2 € Span(VX_4_4|a;2) and so S 2 D Span(VX_4_4|a;2)'*“. Similarly to Example 
1 , since every element ofI ^^4 is a linear combination of the qi,i = 1 ,... ,8, we have 
S 2 = Span(VX^,4|3,J-‘-. 

Remark 10. Notice that both Theorems 6 and 8 are statements about the abstract 
subspace arrangement A, i.e., no finite subset A of A is explicitly considered. To pass 
from A to X and get similar Theorems, we need to require X to be in general position 
in A, in some suitable sense. As one may suspect, this notion of general position must 
entail that polynomials of degree n for Theorem 6, or of degree m for Theorem 8, that 
vanish on X must also vanish on A and vice versa. In that case, we can compute 
the required basis for Ia.u, simply by computing a basis for Ix,n, by means of the 
Veronese embedding described in section 2.1, and similarly /or X_ 4 _m- will make 
the notion of general position precise in Definition 12. 

2.5. Computational complexity and recursive ASC. Although Theorem 8 
is quite satisfactory from a theoretical point of view, using an upper bound m > n for 
the number of subspaces comes with the practical disadvantage that the dimension of 
the Veronese embedding, Mm{D), grows exponentially with m. In addition, increasing 
m also increases the number of polynomials in the null space of Vm{X), some which 
will eventually, as m becomes large, be polynomials that simply fit the data X but 
do not vanish on A. To reduce the computational complexity of the polynomial 
differentiation algorithm, one can consider vanishing polynomials of smaller degree, 
m < n, as suggested by Example 3. While such vanishing polynomials may not be 
sufficient to cluster the data into n subspaces, they still provide a clustering of the 
data into m' < n subspaces. We can then look at each of these m! clusters and see 
if they can be partitioned further. For instance, in Example 3, we can first cluster 
the data into two planes, the plane Si and the plane 7^23 containing the two lines 52 
and 53 , and then partition the data lying in 7^23 into the two lines S 2 and ^ 3 . This 
leads to the recursive ASC algorithm proposed in [16, 41], which is based on finding 
the polynomials of the smallest possible degree m that vanish on the data, computing 
the gradients of these vanishing polynomials to cluster the data into m' < n groups, 
and then repeating the procedure for each group until the data from each group 
can be fit by polynomials of degree 1 , in which case each group lies in single linear 
subspace. While this recursive ASC algorithm is very intuitive, no rigorous proof 
of its correctness has appeared in the literature. In fact, there are examples where 
this recursive method provably fails in the sense of producing ghost subspaces in the 
decomposition of A. For instance, when partitioning the data from Example 3 into 
two planes Si and 7^23, we may assign the data from the intersection of the two planes 
to 7^23. If this is the case, when trying to partition further the data of 7723 , we will 
obtain three lines: ^ 2 , ^3 and the ghost line ^4 = 5i fl 7723 (see Fig. 3(a)). 

2.6. Instability in the presence of noise and spectral ASC. Another im¬ 
portant issue with Theorem 8 from a practical standpoint is its sensitivity to noise. 
More precisely, when implementing Theorem 8 algorithmically, one is required to 
estimate the dimension of the null space of ^^{X), which is an extremely challeng¬ 
ing problem in the presence of noise. Moreover, small errors in the estimation of 
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dm\M{vm{X)) have been observed to have dramatic effects in the quality of the clus¬ 
tering, thus rendering algorithms that are directly based on Theorem 8 unstable. 
While the recursive ASC algorithm of [16, 41] is more robust than such algorithms, it 
is still sensitive to noise, as considerable errors may occur in the partitioning process. 
Moreover, the performance of the recursive algorithm is always subject to degradation 
due to the potential occurrence of ghost subspaces. 

To enhance the robustness of ASC in the presence of noise and obtain a stable 
working algebraic algorithm, the standard practice has been to apply a variation of the 
polynomial differentiation algorithm based on spectral clustering [35]. More specifi¬ 
cally, given noisy data X lying close to a union of n subspaces A, one computes an 
approximate vanishing polynomial p whose coefficients are given by the right singular 
vector of Vnid^) corresponding to its smallest singular value. Given p, one computes 
the gradient of p at each point in X (which gives a normal vector associated with each 
point in A), and builds an affinity matrix between points Xj and Xj/ as the cosine of 
the angle between their corresponding normal vectors, i.e., 

/ ypUj ^P\x., \ 

Mivpi.,inivp^,ii/ ■ 

This affinity is then used as input to any spectral clustering algorithm (see [44] for 
a tutorial on spectral clustering) to obtain a clustering X = IJfci 
Spectral ASC method with angle-based affinity as SASC-A. 

To gain some intuition about C, suppose that A is a union of n hyperplanes and 
that there is no noise in the data. Then p must be of the form p{x) = {bjx) ■ ■ ■ (b^x). 
In this case Cjji is simply the cosine of the angle between the normals to the hy¬ 
perplanes that are associated with points Xj and Xji. If both points lie in the same 
hyperplane, their normals must be equal, and hence Cjji = 1. Otherwise, Cjji < I 
is the cosine of the angles between the hyperplanes. Thus, assuming that the small¬ 
est angle between any two hyperplanes is sufficiently large and that the points are 
well distributed on the union of the hyperplanes, applying spectral clustering to the 
affinity matrix C will in general yield the correct clustering. 

Even though SASC-A is much more robust in the presence of noise than purely 
algebraic methods for the case of a union of hyperplanes, it is fundamentally limited 
by the fact that, theoretically, it applies only to unions of hyperplanes. Indeed, if the 
orthogonal complement of a subspace S has dimension greater than I, there may be 
points x,x' inside S such that the angle between \/p\x and Vp\x' is as large as 90°. 
In such instances, points associated to the same subspace may be weakly connected 
and thus there is no guarantee for the success of spectral clustering. 

2.7. The challenge. As the discussion so far suggests, the state of the art in 
ASC can be summarized as follows: 

1. A complete closed form solution to the abstract subspace clustering problem 
(Problem 1) exists and can be found using the polynomial differentiation 
algorithm implied by Theorem 8. 

2. All known algorithmic variants of the polynomial differentiation algorithm 
are sensitive to noise, especially for subspaces of arbitrary dimensions. 

3. The recursive ASC algorithm described in section 2.5 does not in general 
solve the abstract subspace clustering problem (Problem 1), and is in addition 
sensitive to noise. 

4. The spectral algebraic algorithm described in section 2.6 is less sensitive to 
noise, but is theoretically justified only for unions of hyperplanes. 


(16) 


Ob', angle 
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The above list reveals the challenge that we will be addressing in the rest of 
this paper: Develop an ASC algorithm, that solves the abstract subspace clustering 
problem for perfect data, while at the same time it is robust to noisy data. 

3. Filtrated Algebraic Subspace Clustering - Overview. This section pro¬ 
vides an overview of our proposed Filtrated Algebraic Subspace Clustering (FASC) 
algorithm, which conveys the geometry of the key idea of this paper while keeping 
technicalities at a minimum. To that end, let us pretend for a moment that we have 
access to the entire set A, so that we can manipulate it via set operations such as 
taking its intersection with some other set. Then the idea behind FASC is to construct 
a descending filtration of the given subspace arrangement A C i.e., a sequence of 
inclusions of subspace arrangements, that starts with A and terminates after a finite 
number of c steps with one of the irreducible components S of AA^ 

(17) A=: Aq D Ai D A2 A ■ ■ ■ D Ac = S. 

The mechanism for generating such a filtration is to construct a strictly descending 
filtration of intermediate ambient spaces, i.e., 

(18) Vo A Vi D V 2 A • • • , 

such that Vo = dim(Vs+i) = dim(Vs) — 1, and each V^ contains the same fixed ir¬ 
reducible component <S of A. Then the filtration of subspace arrangements is obtained 
by intersecting A with the filtration of ambient spaces, i.e., 

(19) Ao := A A Ai := A n Vi A A 2 := A n V 2 A ■ • • . 

This can be seen equivalently as constructing a descending filtration of pairs (Vs, As), 
where As is a subspace arrangement of Vs: 

(20) (K^,A) ^ (Vi ^R^-\Ai) ^ (V 2 ^K^“^A 2 ) ^-. 

But how can we construct a filtration of ambient spaces (18), that satisfies the 
apparently strong condition Vs A S, Vs? The answer lies at the heart of ASC: to 
construct Vi pick a suitable polynomial pi vanishing on A and evaluate its gradient 
at a nonsingular point x of A. Notice that x will lie in some irreducible component 
Sx of A. Then take Vi to be the hyperplane of defined by the gradient of pi at 
X. We know from Proposition 56 that Vi must contain Sx- To construct V 2 we apply 
essentially the same procedure on the pair (Vi, Ai): take a suitable polynomial p 2 
that vanishes on Ai, but does not vanish on Vi, and take V 2 to be the hyperplane of 
Vi defined by (Vp 2 |a;)- As we will show in section 4, it is always the case that 
TTVi (Vp 2 |a;) A Sx and so V 2 A Sx- Now notice, that after precisely c such steps, 
where c is the codimension of Sx , Vc will be a (D — c)-dimensional linear subspace of 
that by construction contains Sx- But Sx is also a. {D — c)-dimensional subspace 
and the only possibility is that Vc = Sx- Observe also that this is precisely the step 
where the filtration naturally terminates, since there is no polynomial that vanishes 
on Sx but does not vanish on Vc- The relations between the intermediate ambient 
spaces and subspace arrangements are illustrated in the commutative diagram of (21). 
The filtration in (21) will yield the irreducible component S Sx of A that contains 

will also be using the notation A =: Ao <— Ai •<— A 2 <—■■■, where the arrows denote 
embeddings. 
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the nonsingular point x £ A that we started with. We will be referring to such a 
point as the reference point. We can also take without loss of generality = Si. 
Having identified tSi , we can pick a nonsingular point x' G A — Sx and construct a 
filtration of A as above with reference point x' . Such a filtration will terminate with 
the irreducible component Sx' of A containing x' , which without loss of generality we 
take to be S 2 . Picking a new reference point x" G .4 — iSa; U Sx' and so on, we can 
identify the entire list of irreducible components of A, as described in Algorithm 1. 


( 21 ) 


R^ <—^ 

- Vo 

R^-i 

- Vi 

R^-2 < - 

- V 2 


T 


Ao 

4.1 


^- Sx 


^ - Sx 


A 2 < - Sx 

I 


^D-c+l ^^ 






Sx 


Algorithm 1 Filtrated Algebraic Subspace Clustering (FASC) - Geometric Version 
1: procedure FASC(A) 

2; £^0;£^0; 

3; while A — £ yf 0 do 

4: pick a nonsingular point a; in A — £; 

5: V^K^; 

6: while V n A C V do 

7: find polynomial p that vanishes on An V but not on V, s.t. Vp\x ^ 0 ; 

8: let V be the orthogonal complement of 7rv(Vp|3;) in V; 

9: end while 

10: £ ^ £U {V}; £ ^ £U V; 

11: end while 

12: return £; 

13: end procedure 


Example 11. Consider the setting of Examples 2 and 3. Suppose that in the first 
filtration the algorithm picks as reference point x € S 2 — Si U S 3 . Suppose further 
that the algorithm picks the polynomial p{x) = {blx){f^ x), which vanishes on A but 
certainly not on . Then the first ambient space Vi of the filtration associated to x 
is constructed as Vi = Span(Vp|a;)'''. Since Vp\x = f, this gives that Vi is precisely 
the plane o/R^ with normal vector f. Then Ai is constructed as Ai = An Vi, which 
consists of the union of three lines ^2 U ^3 U 54 , where S 4 is the intersection of Vi 
with Si (see Figs. 3(a) and 3(b)). 
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Fig. 3. (a): The plane spanned by lines S 2 and S 3 intersects the plane <Si at the line 1 S 4 . (b): 
Intersection of the original subspace arrangement ^ = 5i U <S 2 U <S 3 with the intermediate ambient 
space giving rise to the intermediate subspace arrangement = 52 U 53 U 54 . (c): Geometry 

of the unique degree-^ polynomial p(x) = (bjx)(bjx)(bj x) that vanishes on <S 2 U<S 3 UcS 4 as a variety 
of the intermediate ambient space bi ± Si,i = 2,3,4. 


Since Ai CVi, the algorithm takes one more step in the filtration. Suppose that 
the algorithm picks the polynomial q{x) = ( 6 jai)( 6 jx)(bja;), where bi is the unique 
normal vector ofVi that is orthogonal to St, for i = 2, 3,4 (see Fig 3(c)). Because of 
the general position assumption, none of the lines S 2 ,S 3 ,Si is orthogonal to another. 
Consequently, '\/q\x = {bjx){bjx)b 2 0. Moreover, since 62 S Vi, we have that 
TTVi (Vg|a;) = Vg|a; = b 2 defines a line in Vi that must contain S 2 . Intersecting Ai 
with V 2 we obtain „42 = Ai H V 2 = V 2 and the filtration terminates with output the 
irreducible component = S 2 = V 2 of A associated to reference point x. 

Continuing, the algorithm now picks a new reference point x' G A — S^, say 
x' € Si. A similar process as above will identify Si as the intermediate ambient space 
Vi = Sx' of the filtration associated to x' that arises after one step. Then a third 
reference point will be chosen as x" & A — SxC Sx' and S 3 will be identified as the 
intermediate ambient space V 2 = Sx" of the filtration associated to x" that arises after 
two steps. Since the set A — Sx U5a;' U5a:" is empty, the algorithm will terminate and 
return {5a:,iSa;',5a;"}, which is up to a permutation a decomposition of the original 
subspace arrangement into its constituent subspaces. 

Strictly speaking, Algorithm 1 is not a valid algorithm in the computer-science 
theoretic sense, since it takes as input an infinite set A, and it involves operations 
such as checking equality of the infinite sets V and ACV. Moreover, the reader may 
reasonably ask: 

1. Why is it the case that through the entire filtration associated with reference 
point X we can always find polynomials p such that 'S/p\x 7 ^ 0? 

2. Why is it true that even if Vp\x 7 ^ 0 then t^v{'^p\x) 7 ^ 0? 

We address all issues above and beyond in the next section, which is devoted to 
rigorously establishing the theory of the FASC algorithm. 

4. Filtrated Algebraic Subspace Clustering - Theory. This section for¬ 
malizes the concepts outlined in section 3. section 4.1 formalizes the notion of a set X 
being in general position inside a subspace arrangement A. Sections 4.2-4.4 establish 
the theory of a single filtration of a finite subset X lying in general position inside 


this point the reader unfamiliar with algebraic geometry is encouraged to read the appendices 
before proceeding. 
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a transversal subspace arrangement and culminate with the Algebraic Descend¬ 
ing Filtration (ADF) algorithm for identifying a single irreducible component of A 
(Algorithm 2) and the theorem establishing its correctness (Theorem 27). The ADF 
algorithm naturally leads us to the core contribution of this paper in section 4.5, which 
is the FASC algorithm for identifying all irreducible components of A (Algorithm 3) 
and the theorem establishing its correctness (Theorem 28). 

4.1. Data in general position in a subspace arrangement. From an alge¬ 
braic geometric point of view, a union A of linear subspaces is the same as the set 
of polynomial functions that vanish on A. However, from a computer-science- 
theoretic point of view, A and I 4 are quite different: A is an infinite set and hence 
it can not be given as input to any algorithm. On the other hand, even though I 4 is 
also an infinite set, it is generated as an ideal by a finite set of polynomials, which can 
certainly serve as input to an algorithm.That said, from a machine-learning point of 
view, both A and are often unknown, and one is usually given only a finite set of 
points X in A, from which we wish to compute its irreducible components <Si,..., 

To lend ourselves the power of the algebraic-geometric machinery, while providing 
an algorithm of interest to the machine learning and computer science communities, 
we adopt the following setting. The input to our algorithm will be the pair {X,m), 
where A is a finite subset of an unknown union of linear subspaces A := UlLi of 
and m is an upper bound on n. To make the problem of recovering the decomposition 
A — UiLi from X well-defined, it is necessary that A be uniquely identifiable form 
X. In other words, X must be in general position inside A, as defined next. 

Definition 12 (Points in general position). Let X = {xi,... ,xpf} be a finite 
subset of a subspace arrangement A = iSi U • • • U . We say that X is in general 
position in A with respect to degree m, if m > n and A = Z{Xx,m), i-c., if A is 
precisely the zero locus of all homogeneous polynomials of degree m that vanish on X. 

The intuitive geometric condition A = Z{Ix,m) of Definition 12 guarantees that 
there are no spurious polynomials of degree less or equal to m that vanish on X. 

Proposition 13. Let X be a finite subset of an arrangement A of n linear sub¬ 
spaces of M'®. Then X lies in general position inside A with respect to degree m if 
and only ifTA,k = Xr.fe, V/c < m. 

Proof. (=^) We first show that lA,m = Tx.m- Since A D A, every homogeneous 
polynomial of degree m that vanishes on A must vanish on A, i.e., C Tx,m- 

Conversely, the hypothesis A = Z{Zx,m) implies that every polynomial of Tx,m must 
vanish on A, i.e., lA,m D Tx,m- 

Now let k < m. As before, since A D A, we must have X _4 ^ C Ix,k- For 
the converse direction, suppose for the sake of contradiction that there exists some 
p € Tx,k that does not vanish on A. This means that there must exist an irreducible 
component of A, say 5i, such that p does not vanish on 5i. Let C be a vector of 
non-orthogonal to Si, i.e., the linear form g{x) = does not vanish on <Si. 
Since p vanishes on A so will the degree m polynomial g^~^p, i.e., g^~^p G Xx.m- 
But we have already shown that Xx,m = lA,m, and so it must be the case that 
gm-kp g XA,m. Since g"^~^p vanishes on A, it must vanish on iSi, i.e., g"^~^p G Xs^. 
Since by hypothesis p ^ X^j^, and since Xs^ is a prime ideal (see 53), it must be the 
case that g X^^ . But again because X^^ is a prime ideal, we must have that 

g G Xgj. But this is true if and only if C G S^, which contradicts the definition of C- 
(<G=) Suppose XA,k = Z-x^k, Vfc < m. We will show that A = Z{Xx,m)- But this 
is the same as showing that A = Z[XA,m), which is true, by Proposition 55. □ 
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The next Proposition ensures the existence of points in general position with 
respect to any degree m> n. 

Proposition 14. Let A be an arrangement of n linear subspaces of and let 
m be any integer > n. Then there exists a finite subset X G A that is in general 
position inside A with respect to degree m. 

Proof. By Proposition 55 X _4 is generated by polynomials of degree < m. Then by 
Theorem 2.9 in [25], there exists a finite set T C ^ such that TA,k = Tx,k, V/c < m, 
which concludes the proof in view of Proposition 13. □ 

Notice that there is a price to be paid by requiring X to be in general position, 
which is that we need the cardinality of X to be artificially large, especially when 
m — n is large. In particular, since the dimension of Tx,m must match the dimension 
of T-A.ra-, the cardinality of X must be at least Mm{D) — Aim{XA,m)- 

The next result will be useful in the sequel. 

Lemma 15. Suppose that X is in general position inside A with respect to degree 
m. Let n' < n. Then the set X^'^ ^ X — Xi lies in general position inside the 
subspace arrangement A^"^ ^ := U • • • U <S„ with respect to degree m — n'. 

Proof. We begin by noting that m — n' is an upper bound on the number of 
subspaces of the arrrangement A^'^ \ According to Proposition 13, it is enough to 
prove that a homogeneous polynomial p of degree less or equal than m — n' vanishes 
on if and only if it vanishes on \ So let p be a homogeneous polynomial of 
degree less or equal than m — n'. If p vanishes on \ then it certainly vanishes on 

\ It remains to prove the converse. So suppose that p vanishes on \ Suppose 
that for each i = 1 ,... ,n' we have a vector T Si, such that Ci / Sn'+i, . ■ ■, 
Next, define the polynomial r(x) = iCjx) • ■ • Then r has degree < m and 

vanishes on X. Since X is in general position inside A, r must vanish on A. For the 
sake of contradiction suppose that p does not vanish on A*-" ^. Then p does not vanish 
say on Sn- On the other hand r does vanish on Sn, hence r G or equivalently 
(C7 x) ■ ■ ■ {CZ.'X)p{x) G Tsn ■ Since Is„ is a prime ideal we must have either Cjx € Xs„ 
for some i € [n']^ or p € Xs„. Now, the latter can not be true by hypothesis, thus 
we must have Ci x G Is„ for some i G [n'j. But this implies that Ci A Sn, which 
contradicts the hypothesis on Ci- Hence it must be the case that p vanishes on ). 

To complete the proof we show that such vectors CiA = i, ■ ■. ,n' always exist. 
It is enough to prove the existence of Ci- If every vector of orthogonal to 5i 
were orthogonal to, say then we would have that C or equivalently. 

Remark 16. Notice that the notion of points X lying in general position inside a 
subspace arrangement A is independent of the notion of transversality of A (Definition 
4). Nevertheless, to facilitate the technical analysis by avoiding degenerate cases of 
subspace arrangements, in the rest of section 4 we will assume that A is transversal. 
For a geometric interpretation of transversality as well as examples, the reader is 
encouraged to consult Appendix C. 

4.2. Constructing the first step of a filtration. We will now show how to 
construct the first step of a descending filtration associated with a single irreducible 
component of A, as in (21). Once again, we are given the pair (X, m), where A is a 
finite set in general position inside A with respect to degree m, A is transversal, and 
m is an upper bound on the number n of irreducible components of A (section 4.1). 

To construct the first step of the filtration, we need to find a first hyperplane Vi of 
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that contains some irreducible component Si of A. According to Proposition 56, it 
would be enough to have a polynomial pi that vanishes on the irreducible component 
Si together with a point x G Si. Then Vpija, would be the normal to a hyperplane 
Vi containing Si. Since every polynomial that vanishes on A necessarily vanishes on 
= 1,... ,n, a reasonable choice is a vanishing polynomial of minimal degree k, 
i.e., some 0 7 ^ pi G lA.k, where k is the smallest degree at which I 4 is non-zero. Since 
X is assumed in general position in A with respect to degree to, by Proposition 13 
we will have ^A.k = 2 A,/c, and so our pi can be computed as an element of the right 
null space of the embedded data matrix Vk{X). The next Lemma ensures that given 
any such pi, there is always a point x'm X such that Vpija, 7 ^ 0. 

Lemma 17. Let 0 7 ^ pi G Ix,k be a vanishing polynomial of minimal degree. Then 
there exists 0 ^ x G X such that Vpija, 7 ^ 0, and moreover, without loss of generality 

^ G Si LJi>l 

Proof. We hrst establish the existence of a point x G X such that Vpi \x ^ 0 . For 
the sake of contradiction, suppose that no such x G X exists. Since 0 7 ^ pi G 
Pi can not be a constant polynomial, and so there exists some j G {D\ such that 
the degree k — 1 polynomial is not the zero polynomial. Now, by hypothesis 

Vpi|^ = 0, Va; G X, hence = 0, Va; G X. But then, 0 7^ G Ix,k-i and 

this would contradict the hypothesis that k is the smallest index such that Ix,k 7^ 0 . 
Hence there exists x G X such that Vpija, 7 ^ 0. To show that x can be chosen to be 
non-zero, note that if fc = 1 , then Vpi is a constant vector and we can take x to be 
any non-zero element of A. li k > 1 then Vpi|o = 0 and so x must necessarily be 
different from zero. 

Next, we establish that x G 5i— Si. Without loss of generality we can assume 
that X G Xi := X n Si. For the sake of contradiction, suppose that a; G tSi fl 5^ for 
some i > 1 . Since a; 7^ 0, there is some index j G [D] such that the coordinate of x, 
denoted by Xj^ is different from zero. Define g{x) := x'^~^pi{x). Then g G Tx.n and 
by the general position assumption we also have that g G Ta.u- Since A is assumed 
transversal, by Theorem 58, g can be written in the form 


( 22 ) 


E 




’"iG Ci , iG n 


where G R. is a scalar coefficient, is a linear form vanishing on Si, and the 

summation runs over all multi-indices (ri,..., r„) G [ci] x ■ • • x [c„]. Then evaluating 
the gradient of the expression on the right of ( 22 ) at x, and using the hypothesis that 
a; G 5i n for some i > 1, we see that Vg\x = 0. However, evaluating the gradient 
of 5 at a; from the formula g{x) := x'J~^pi{x), we get Vg\x = x"~^Vpi|a; 7 ^ 0. This 
contradiction implies that the hypothesis a; G <Si fl 5i for some i > 1 can not be true, 
i.e., X lies only in the irreducible component <Si. □ 

Using the notation established so far and setting bi = Vpi|a;, the hyperplane of 
R^ given by Vi = Span(6i)-*- = Z{bjx) contains the irreducible component of A 
associated with the reference point x, i.e., Vi D Si. Then we can define a subspace 
sub-arrangement Ai of A by 


(23) Ai := A n Vi = <Si U (52 n Vi) U • • • U (5„ n Vi). 


Observe that Ai can be viewed as a subspace arrangement of Vi, since Ai C Vi 
(see also the commutative diagram of eq. (21)). Certainly, our algorithm can not 
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manipulate directly the infinite sets A and Vi- Nevertheless, these sets are algebraic 
varieties and as a consequence we can perform their intersection in the algebraic 
domain. That is, we can obtain a set of polynomials defining ^fl Vi, as shown next.^^ 

Lemma 18. Ai := AnVi is the zero set of the ideal generated bylx,m and bjx, 

i.e., 

(24) Ai=Z{ai), ai:={lx,m) + {bjx). 

Proof. (=>) : We will show that Ai C Z{ai). Let w be a polynomial of ai. 
Then by definition of ai, w can be written as w = wi + W 2 , where wi G {Ix,m) and 
W 2 G {bix). Now take any point y G .4i. Since y € A, and Ix,m = ^A.m, we must 
have wi(y) = 0. Since y G Vi, we must have that W 2 {y) = 0. Hence w{y) = 0, i.e., 
every point of Ai is inside the zero set of ai. : We will show that Ai D Z (ai). 
Let y & Z (tti), i.e., every element of ai vanishes on y. Hence every element of Ix,m 
vanishes on y, i.e., y G Z{Ix,m) = A. In addition, every element of {b^x) vanishes 
on y, in particular bjy = 0, i.e., y G Vi. □ 

In summary, the computation of the vector bi T iSi completes algebraically the 
first step of the hltration, which gives us the hyperplane Vi and the sub-variety Ai- 
Then, there are two possibilities: Ai = Vi or Ai C Vi. In the first case, we need to 
terminate the filtration, as explained in section 4.3, while in the second case we need 
to take one more step in the filtration, as explained in section 4.4. 

4.3. Deciding whether to take a second step in a filtration. If Ai = Vi, we 

should terminate the filtration because in this case Vi = iSi, as Lemma 19 shows, and 
so we have already identified one of the subspaces. Lemma 20 will give us an algebraic 
procedure for checking if the condition Ai = Vi holds true, while Lemma 21 will give 
us a computationally more friendly procedure for checking the same condition. 

Lemma 19. Vi = Ai if and only if Vi = iSi. 

Proof. (=>) : Suppose Vi = yli = <Si U {S 2 fl Vi) U • • • U (<Sn n Vi). Taking the 
vanishing-ideal operator on both sides, we obtain 

(25) Xvi = Isi n XsanVi n • • • n X^^nVi ■ 

Since Vi is a linear subspace, Ivi is a prime ideal by Proposition 53, and so by 
Proposition 32 Xvi must contain one of the ideals Xg^jX^^nVn • ■ • )2l5„nVi- Suppose 
that Xvi D IsinVi for some i > 1. Taking the zero-set operator on both sides, 
and using Proposition 44 and the fact that linear subspaces are closed in the Zariski 
topology, we obtain Vi C <Si n Vi, which implies that Vi C Si. Since Si C Vi, we must 
have that iSi C Si, which contradicts the assumption of transversality on A. Hence it 
must be the case that Xvi X Isi ■ Taking the zero-set operator on both sides we get 
Vi C <Si, which implies that Vi = 5i, since 5i C Vi. (<;=) : Suppose Vi = *Si. Then 
Vi = <Si C .4i c Vi = 5i and so .4i = Vi. □ 

Knowing that a filtration terminates if Ai = Vi, we need a mechanism for checking 
this condition. The next lemma shows how this can be done in the algebraic domain. 

Lemma 20. Vi = .4i if and only iflx,m C (bJx) m ■ 

Proof. (=^>) : Suppose Ai = Vi. Then ^ D Vi and by taking vanishing ideals 
on both sides we get C Xy^ = {bJx). Since Ix,m = XA,m C X^, it follows that 


^®Lemma 18 is a special case of Proposition 43. 
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'^x,m C {bjx)m- (•^) : Suppose Ix,m C {bjx)m and for the sake of contradiction 
suppose that Ai C Vi- In particular, from Lemma 19 we have that iSi C Vi- Hence, 
there exists a vector linearly independent from bi such that _L tSi. Now for 
any i > 1, there exists Ci linearly independent from bi such that Ci A Si. For if not, 
then Xsi C Ivi and so Si D Vi, which leads to the contradiction Si D Si. Then the 
polynomial ’'' iCZ^) i® an element of = Xx,n and by the hypothesis that 

Xx,m C {bjx)m we must have that {Cjx)^~'^'^^ ■ ■ ■ (CZ^) S (bjx). But {bjx) is a 
prime ideal and so one of the factors of (C7 x) ■ ■ ■ (CZ^) must lie in {bJx). So suppose 
<^Jx € (bJx), for some j € \n\. This implies that there must exist a polynomial h 
such that = h{bjx). By degree considerations, we conclude that h must be a 
constant, in which case the above equality implies Cj — bi- But this is a contradiction 
on the definition of Cj- Hence it can not be the case that „4i C Vi. □ 

Notice that checking the condition Xx.m C {bZx)m in Lemma 20, requires com¬ 
puting a basis of Xx.m and checking whether each element of the basis is divisible 
by the linear form b^x. Equivalently, to check the inclusion of finite dimensional 
vector spaces Xx.m C {blx)m we need to compute a basis Bx,m of Xx,m as well as a 
basis B of {b^ x)m and check whether the rank equality iank([Bx,Tn B]) — rank(B) 
holds true. Note that a basis of (bjx}m can be obtained in a straightforward man¬ 
ner by multiplying all monomials of degree m — 1 with the linear form bJx. On the 
other hand, computing a basis of Xx,m by computing a basis for the right nullspace 
of can be computationally expensive, particularly when m is large. If how¬ 

ever, the points X D Si are in general position in iSi with respect to degree m, then 
checking the condition Xx,m C {bZx)m can be done more efficiently, as we now ex¬ 
plain. Let Vi = [ui,..., ■ud-i] be a basis for the vector space Vi. Then Vi is 
isomorphic to under the linear map uvi : Vi —)■ that takes a vector 

V = aiVi -I- ■ • • -b ao-iVD-i to its coordinate representation (oi,..., ao-i)^. Then 
the next result says that checking the condition Vi = Ai is equivalent to checking 
the rank-deficiency of the embedded data matrix ^'m(crvi(df H Vi)), which is compu¬ 
tationally a simpler task than computing the right nullspace of Vra{X). 

Lemma 21. Suppose that Xi is in general position inside Si with respect to degree 
m. Then Vi = Ai if and only if the embedded data matrix Vrn{'XVi{X H Vi)) is full 
rank. 

Proof. The statement is equivalent to the statement “Vi = Ai if and only if 
^xnVi.m = {bZx)m \ which we now prove. (^) : Suppose Vi = Ai. Then by Lemma 
19 Vi = 5i, which implies that Xsi = (bjx). This in turn implies that Is^,m = 
{bjx)m. Now XxnVi.m = XxnSi.m = Xxi,m- By the general position hypothesis 
on Xi we have Xs^^m = Xx^,m- Hence XxnVi.m = {bZx)m- (^) : Suppose that 
XvnVi.m = {bJx)m- For the sake of contradiction, suppose that Ai Vi. Since Ai is 
an arrangement of at most m subspaces, there exists a homogeneous polynomial p of 
degree at most m that vanishes on Ai but does not vanish on Vi. Since X (iVi C Ai, 
p will vanish on fl Vi, i.e., p G XxnVi.m or equivalently p G {bZx)m by hypothesis. 
But then p vanishes on Vi, which is a contradiction; hence it must be the case that 

Vi=Ai. □ 

4.4. Taking multiple steps in a filtration and terminating. If Ai C Vi, 

then it follows from Lemma 19 that tSi C Vi. Therefore, subspace iSi has not yet 
been identified in the first step of the filtration and we should take a second step. 
As before, we can start constructing the second step of our filtration by choosing a 
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suitable vanishing polynomial P 2 , such that its gradient at the reference point x is 
not colinear with bi. The next Lemma shows that such a P 2 always exists. 

Lemma 22. X admits a homogeneous vanishing polynomial p 2 of degree £ < n, 
such that p 2 0 2vi Vp 2 |a; ^ Span(bi). 

Proof. Since Ai C Vi, Lemma 19 implies that 5i C Vi. Then there exists a vector 
C 1 that is orthogonal to 5i and is linearly independent from bi. Since x G 5i—(J, 
for each i > 1 we can find a vector Ci such that Ci ^ and Ci -L >5^. Notice that the 
pairs bi,Ci are linearly independent for f > 1, since bi ± x but Ci Now, the 

polynomial p 2 := (Ci x) ■ • ■ (C^x) has degree n and vanishes on A, hence p 2 € 
Moreover, Vp 2 |a: = (C 2 ''' (Cn 3;)Ci 7 ^ 0, since by hypothesis Cjx 0,V* > 1. 

Since Ci is linearly independent from 61 , we have Vp 2 |a; ^ Span( 6 i). Finally, p 2 does 
not vanish on Vi, by a similar argument to the one used in the proof of Lemma 20. □ 

Remark 23. Note that if £ is the degree of p 2 as in Lemma 22, and if qi,... ,qs 
is a basis for then at least one of the qi satisfies the conditions of the Lemma. 

This is important algorithmically, because it implies that the search for our p 2 can be 
done sequentially. We can start by first computing a minimal-degree polynomial in 
lA.k, and see if it satisfies our requirements. If not, then we can compute a second 
linearly independent polynomial and check again. We can continue in that fashion 
until we have computed a full basis for Ix,k ■ If no suitable polynomial has been found, 
we can repeat the process for degree k -\-l, and so on, until we have reached degree n, 
if necessary. 

By using a polynomial p 2 as in Lemma 22, Proposition 56 guarantees that Vp 2 \x 
will be orthogonal to 5i. Recall though that for the purpose of the filtration we are 
interested in constructing a hyperplane V 2 of Vi. Since there is no guarantee that 
Vp 2 |a; is inside Vi (thus defining a hyperplane of Vi), we must project 'S/p 2 \x onto Vi 
and guarantee that this projection is still orthogonal to iSi. The next Lemma ensures 
that this is always the case. 

Lemma 24. Let Q ^ p 2 & Tx,<m — Hvi such that Vp 2 |a; ^ Span(bi). Then 0 ^ 

'^Vi(yP2\x) -L >5i- 

Proof. For the sake of contradiction, suppose that 7 rvi(Vp 2 |a:) = 0. Setting bn := 
61 , let us augment bn to a basis bn, bi 2 ..., bic for the orthogonal complement of <Si 
in R-®. In fact, we can choose the vectors bi 2 ,..., bic to be a basis for the orthogonal 
complement of Si inside Vi. By proposition 52, p 2 must have the form 

(26) p 2 (x) = qi{x){bj^x) + q 2 {x){bj. 2 x) H-h qcix){bj^x), 

where qi,... ,qc are homogeneous polynomials of degree deg(p 2 ) — 1- Then 

(27) Vp 2 |cc = qi{x)bii + q 2 {x)bi 2 H-h qc{x)bic. 

Projecting the above equation orthogonally onto Vi we get 

(28) 7rvi(Vp2|cD) =q2{x)bi2-\ - \-qc{x)bic, 

which is zero by hypothesis. Since bi 2 , • • • , bic are linearly independent vectors of Vi 
it must be the case that q 2 {x) = • • ■ = qc{x) = 0. But this implies that Vp 2 |a; = 
qi(x)bii, which is a contradiction on the non-colinearity of Vp 2 |a: with bn. Hence it 
must be the case that 0 ^ 7 rvi(Vp 2 |a;)- The fact that 'xvA'^P 2 .\x) -L follows from 
(28) and the fact that by definition bi 2 ,..., bic are orthogonal to 5i. □ 
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At this point, letting 62 := t^vA'^P 2 \x)i we can define V 2 = Span( 6 i, 62 )^, which 
is a subspace of codimension 1 inside Vi (and hence of codimension 2 inside Vq := K^). 
As before, we can define a subspace sub-arrangement A 2 of Ai by intersecting Ai 
with V 2 - Once again, this intersection can be realized in the algebraic domain as 
A 2 = Z{Ix,m,bJx,b 2 x). Next, we have a similar result as in Lemmas 19 and 20, 
which we now prove in general form: 

Lemma 25. Let bi,... ,bg be s vectors orthogonal to iSi and define the interme¬ 
diate ambient space Vs := Span(bi,--- , 6 s)'*‘- -bet As be the subspace arrangement 
obtained by intersecting A with Vs- Then the following are equivalent: 

(i) Vs = As 
(a) Vs = 5i 

(Hi) 5i = Span|^bi,..., bs)^ 

(iv) Ix,m C {bjx,..., bjx)m. 

Proof, (i) => (a) : By taking vanishing ideals on both sides of Vs = 5i O 

Vs) we get Iva = OtyiT-SinVs- By using Proposition 32 in a similar fashion as 
in the proof of Lemma 19, we conclude that Vs = >Si. {H) => (Hi) : This is obvious 
from the definition of Vs- (Hi) => (iv) : Let h G Tx,m- Then h vanishes on A and 
hence on 5i and by Proposition 52 we must have that h € Isi = (bj^x,... ,bjx). 
(iv) ^ (i) : Ix,m C (bjx,..., bjx)m can be written as Ix,m C Tvs- By the general 
position assumption TA,m = Tx,m and so we have Ta.tu C Xy,. Taking zero sets on 
both sides we get A D Vs, and intersecting both sides of this relation with Vs, we get 
As Z) Vs- Since As C Vs, this implies that Vs = As- □ 

Similarly to Lemma 21 we have: 

Lemma 26. Let Vs = [I’l,.. ■, i’d-s] be a basis for Vs, and let avs ■ b’s —>■ 
be the linear map that takes a vector v = aiVi -I- • • • -b old-sVd-s to its coordinate 
representation (ai,... ,aD-s)^■ Suppose that Xi is in general position inside 5i 
with respect to degree m. Then Vs = As if and only if the embedded data matrix 
Vm(<xv,(X n Vs)) is full rank. 

By Lemma 25, if Ix,m C (b^x, b^x), the algorithm terminates the filtration with 
output the orthogonal basis { 61 , 62 } for the orthogonal complement of the irreducible 
component iSi of A. If on the other hand Ix,m ejL (b^x,b^x), then the algorithm 
picks a basis element ps of Tx,m such that ps and Vpsla; ^ Span( 5 i,b 2)5 

and defines a subspace V 3 of codimension 1 inside V 2 using ttvs (Vpsla:).^® Setting 
bs := TTVj (Vpsla;), the algorithm uses Lemma 25 to determine whether to terminate 
the filtration or take one more step and so on. 

The principles established in the previous sections, formally lead us to the alge¬ 
braic descending filtration Algorithm 2 and its Theorem 27 of correctness. 

Theorem 27 (Correctness of Algorithm 2). Let X = {xi ,..., x^} be a finite 
set of points in general position (Definition 12) with respect to degree m inside a 
transversal (Definition 4) arrangement A of at most m linear subspaces ofM.^. Let 
p be a polynomial of minimal degree that vanishes on X. Then there always exists a 
nonsingular x G X such that Vpja, 7 ^ 0 , and for such an x, the output S of Algo¬ 
rithm 2 is an orthogonal basis for the orthogonal complement in of the irreducible 
component of A that contains x. 


^®The proof of existence of such a pa is similar to the proof of Lemma 22 and is omitted. 
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Algorithm 2 Algebraic Descending Filtration (ADF) 

1 : procedure ADF(p, a:, A, m) 

2 : 

3: while Tx,m’t- {b^x : 6 S ®) do 

4: find p e Ix,<m — {b^X : 6 e * 8 ) s.t. Vp|^ ^ Span(* 8 ); 

5: *8 ^ U {7rspan(B)-L (VpU)}; 

6: end while 

7: return i8; 

8 : end procedure 


4.5. The FASC algorithm. In Sections 4.2-4.4 we established the theory of a 
single filtration, according to which one starts with a nonsingular point Xi := x £ 
A n A and obtains an orthogonal basis bn,, bicj for the orthogonal complement 
of the irreducible component <Si of A that contains reference point Xi. To obtain an 
orthogonal basis 621 ,..., b 2 c 2 corresponding to a second irreducible component S 2 of 
A, our approach is the natural one: remove Ai from A and run a filtration on the set 
A^^) := A — Ai. All we need for the theory of Sections 4.2-4.4 to be applicable to the 
set A(i), is that A*^^) be in general position inside the arrangement 52 U- • -UiSn. 

But this has been proved in Lemma 15. With Lemma 15 establishing the correctness 
of recursive application of a single filtration, the correctness of the FASC Algorithm 
3 follows at once, as in Theorem 28. Note that in Algorithm 3, n is the number of 
subspaces, while S) and £ are ordered sets, such that, up to a permutation, the i-th 
element of '£> is di = dim<Si, and the i-th element of £ is an orthogonal basis for <5^^. 


Algorithm 3 Filtrated Algebraic Subspace Clustering 
1 : procedure FASC(A 
2: n 0; D 0; £ ^ 0; 

3: while A 7 ^ 0 do 

4: find polynomial p of minimal degree that vanishes on A; 

5: find X G X s.t. Vp\x 7 ^ 0; 

6 : *8 ^ ADF(p, a:. A, m); 

7: £^£U{«8}; 

8 : S) •<— S) U {D — card(* 8 )}; 

9: A ^ A-Span(<8)-L; 

10: n •(— n -I- 1; m •<— TO — 1; 

11: end while 

12: return n, S), £; 

13: end procedure 


Theorem 28 (Correctness of Algorithm 3). Let A = {xi, ..., xjsf} be a set in 
general position with respect to degree m (Definition 12) inside a transversal (Defi¬ 
nition 4 ) arrangement A of at most to linear subspaces of . For such an A and 
TO, Algorithm 3 always terminates with output a set £ = {$ 1 ,..., i8„}, such that up 
to a permutation, *8^ is an orthogonal basis for the orthogonal complement of the i*^ 
irreducible component Si of A, i.e., Si = Span(*8i)'*‘, i = 1,..., n, and A — ljr=i 

5. Filtrated Spectral Algebraic Subspace Clustering. In this section we 
show how FASC (Sections 3-4) can be adapted to a working subspace clustering 
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algorithm that is robust to noise. As we will soon see, the success of such an algorithm 
depends on being able to 1) implement a single filtration in a robust fashion, and 2) 
combine multiple robust filtrations to obtain the clustering of the points. 

5.1. Implementing robust filtrations. Recall that the filtration component 
ADF (Algorithm 2) of the FASC Algorithm 3, is based on computing a descending 
filtration of ambient spaces Vi D V 2 D ■ • •. Recall that Vi is obtained as the hyper¬ 
plane of with normal vector Vp\x, where x is the reference point associated with 
the filtration, and p a polynomial of minimal degree k that vanishes on X. In the 
absence of noise, the value of k can be characterized as the smallest £ such that vi{X) 
drops rank (see section 2.1 for notation). In the presence of noise, and assuming that 
X has cardinality at least there will be in general no vanishing polynomial 

of degree < m, i.e., the embedded data matrix viiX) will have full column rank, for 
any £ < m. Hence, in the presence of noise we do not know a-priori what the minimal 
degree k is. On the other hand, we do know that m > n, which implies that the un¬ 
derlying subspace arrangement A admits vanishing polynomials of degree m. Thus a 
reasonable choice for an approximate vanishing polynomial pi := p, is the polynomial 
whose coefficients are given by the right singular vector of Vm{X) that corresponds 
to the smallest singular value. Recall also that in the absence of noise we chose our 
reference point x € X such that Vpila, ^ 0. In the presence of noise this condition 
will be almost surely true every point x € X; then one can select the point that gives 
the largest gradient, i.e., we can pick as reference point an x that maximizes the norm 
of the gradient ||Vpi|a;|| 2 - 

Moving on, ADF constructs the filtration of X by intersecting X with the inter¬ 
mediate ambient spaces Vi D V 2 A ■ • •. In the presence of noise in the dataset X, 
such intersections will almost surely be empty. As it turns out, we can replace the 
operation of intersecting X with the intermediate spaces Vs, s = 1,2,..., by project¬ 
ing X onto Vs. In the absence of noise, the norm of the points of X that lie in Vs 
will remain unchanged after projection, while points that lie outside Vs will witness 
a drop in their norm upon projection onto Vs. Points whose norm is reduced can 
then be removed and the end result of this process is equivalent to intersecting X 
with Vs. In the presence of noise one can choose a threshold (5 > 0, such that if the 
distance of a point from subspace Vs is less than <5, then the point is maintained after 
projection onto Vs, otherwise it is removed. But how to choose SI One reasonable 
way to proceed, is to consider the polynomial p that corresponds to the right singular 
vector of r'm(A’) of smallest singular value, and then consider the quantity 



(29) 


Notice that in the absence of noise dim A/’(^'m(<V)) > 0 and subsequently (3{X) = 0. 
In the presence of noise however, /3(A) represents the average distance of a point x 
in the dataset to the hyperplane that it produces by means of Vp\x (in the absence 
of noise this distance is zero by Proposition 56). Hence intuitively, S should be of 
the same order of magnitude as /3(A); a natural choice is to set 5 := 7 • /3(A), where 
7 is a user-defined parameter taking values close to 1. Having projected A onto Vi 
and removed points whose distance from Vi is larger than 5, we obtain a second 
approximate polynomial p 2 from the right singular vector of smallest singular value 
of the embedded data matrix of the remaining projected points and so on. 

It remains to devise a robust criterion for terminating the filtration. Recall that 
the criterion for terminating the filtration in ADF is Ix,m C {b^x ,..., bjx)m, where 
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Vs = Span( 6 i ,... ,bs)^. Checking this criterion is equivalent to checking the inclu¬ 
sion Ix,m C {bjX ,..., bj x)m of finite dimensional vector spaces. In principle, this 
requires computing a basis for the vector space Ix,m- Now recall from section 2.6, 
that it is precisely this computation that renders the classic polynomial differentia¬ 
tion algorithm unstable to noise; the main difficulty being the correct estimation of 
and the dramatic dependence of the quality of clustering on this esti¬ 
mate. Consequently, for the purpose of obtaining a robust algorithm, it is imperative 
to avoid such a computation. But we know from Lemma 26 that, if Xi := X OSi is in 
general position inside Si with respect to degree m for every i £ [n], then the criterion 
for terminating the filtration is equivalent to checking whether in the coordinate rep¬ 
resentation of Vs the points X nVg admit a vanishing polynomial of degree m. But 
this is computationally equivalent to checking whether A/" (t'm {<^v^ {X fl Vs))) 7 ^ 0; see 
notation in Lemma 26. This is a much easier problem than estimating XmviXx^m), 
and we solve it implicitly as follows. Recall that in the absence of noise, the norm 
of the reference point remains unchanged as it passes through the filtration. Hence, 
it is natural to terminate the filtration at step s, if the distance from the projected 
reference point^^ to Vs-i-i is more than <5, i.e., if the projected reference point is among 
the points that are being removed upon projection from Vs to Vs-i-i. To guard against 
overestimating the number of steps in the filtration, we enhance the termination cri¬ 
terion by additionally deciding to terminate at step s if the number of points that 
survived the projection from Vs to Vs+i is less than a pre-defined integer L, which is 
to be thought of as the minimum number of points in a cluster. 

5.2. Combining multiple filtrations. Having determined a robust algorith¬ 
mic implementation for a single filtration, we face the following issue: In general, two 
points lying approximately in the same subspace S will produce different hyperplanes 
that approximately contain S with different levels of accuracy. In the noiseless case 
any point would be equally good. In the presence of noise though, the choice of the 
reference point x becomes significant. How should x be chosen? To deal with this 
problem in a robust fashion, it is once again natural to construct a single filtration 
for each point in X and define an affinity between points j and j' as 


(30) 


Cii',FSASC 



TT[^\xj')\\ if remains 
otherwise, 


where tts is the projection from Vs to Vs+i associated to the filtration of point Xj 
and Sj is the length of that filtration. This affinity captures the fact that if points Xj 
and Xji are in the same subspace, then the norm of Xji should not change from step 
0 to step c of the filtration computed with reference point Xj, where c = D — dim)^) 
is the codimension of the irreducible component S associated to reference point Xj. 
Otherwise, if Xj and xy are in different subspaces, the norm of Xji is expected to be 
reduced by the time the filtration reaches step c. In the case of noiseless data, only 
the points in the correct subspace survive step c and their norms are precisely equal 
to one. In the case of noisy data, the affinity defined above will only be approximate. 

5.3. The FSASC algorithm. Having an affinity matrix as in eq. (30), standard 
spectral clustering techniques can be applied to obtain a clustering of X into n groups. 
We emphasize that in contrast to the abstract case of Algorithm 3, the number n of 

^^Here by projected reference point we mean the image of the reference point under all projections 
up to step s. 
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clusters must be given as input to the algorithm. On the other hand, the algorithm 
does not require the subspace dimensions to be given: these are implicitly estimated 
by means of the filtrations. Finally, one may choose to implement the above scheme 
for M distinct values of the parameter 7 and choose the affinity matrix that leads 
to the smallest eigengap. The above discussion leads to the Filtrated Spectral 
Algebraic Subspace Clustering (FSASC) Algorithm 4, in which 

• Spectrum(A^L(C + C^)) denotes the spectrum of the normalized Laplacian 


matrix of C + C , 

SpecClust(C* + (C*)^, n) denotes spectral clustering being applied to C* + 
C*^ to obtain n clusters, 

Vanishing(p„(A’)) is the polynomial whose coefficients are the right singular 
vector of Vn{X) corresponding to the smallest singular value. 


• TT 


n 


is to be read as “tt is assigned the composite linear 


transformation —s- ^ where the first arrow is the orthogonal 

projection of to hyperplane H, and the second arrow is the linear iso¬ 
morphism that maps a basis of T-L in to the standard coordinate basis of 

Rd-1,7 


5.4. A distance-based affinity (SASC-D). Observe that^® the FSASC affin¬ 
ity (30) between points Xj and Xji , can be interpreted as the distance of point Xf 
to the orthogonal complement of the final ambient space Vs^- of the filtration corre¬ 
sponding to reference point Xj. If all irreducible components of A were hyperplanes, 
then the optimal length of each filtration would be 1. Inspired by this observation, 
we may define a simple distance-based affinity, alternative to the angle-based affinity 
of eq. (16), by 


(31) 


C, 


n’ ,dist 


:= 1 - 


xJ,Vp\a:.\ 

II2 


The affinity of eq. (31) is theoretically justified only for hyperplanes, as Cjj'_angie is; 
yet as we will soon see in the experiments, Cjji ^dist is much more robust than Cj^'angle 
in the case of subspaces of different dimensions. We attribute this phenomenon to 
the fact that, in the absence of noise, it is always the case that C^y/^dist = 1 whenever 
Xj^Xji lie in the same irreducible component; as mentioned in section 2.6, this need 
not be the case for Cjjgangie- We will be referring to the Spectral ASC method that 
uses affinity (31) as SASC-D. 

5.5. Discussion on the computational complexity. As mentioned in section 
2 , the main object that needs to be computed in algebraic subspace clustering is a 
vanishing polynomial p in D variables of degree n, where D is the ambient dimension 
of the data and n is the number of subspaces. This amounts to computing a right null- 
vector of the N X Mn(,D) embedded data matrix t'n(A’), where Mn{D) := 
and N > AAn{D). In practice, the data are noisy and there are usually no vanishing 
polynomials of degree n; instead one needs to compute the right singular vector of the 
embedded data matrix that corresponds to the smallest singular value. Approximate 
iterative methods for performing this task do exist [27, 19, 46], and in this work 
we use the MATLAB function svds.m, which is based on an inverse-shift iteration 
technique; see, e.g., the introduction of [19]. Even though svds .m is in principle more 


^®We will henceforth be assuming that all points xi,..., tcjv are normalized to unit f 2 -norm. 






Algorithm 4 Filtrated Spectral Algebraic Subspace Clustering (FSASC) 


1 : procedure FSASC{X , D,n, L, {jjn}m=i) 

2: if A < A4„(D) then 

3: return (’Not enough points’); 

4: else 

5: eigengap 0; C* ^ Onxn; 

6 : Xj ^Xj/\\xj\\,yj €[N]; 

7: p VANISHING(:^„(A)); 

8’ P ^ Yjj=i |(®i: iivpUjll^l’ 

9: for k = 1 : M do 

10: S <— P ■ 7fe, C •<— OnxNj 

11: for j = 1 : A do 

12: Cj^: ^ FlLTRATlON{X,Xj,p, L,S,n); 

13: end for 

14: {AsjfLi ^ Spectrum(AL(C + C^)) ; 

15: if (eigengap < A„+i — A„) then 

16: eigengap A „+1 - A„; C* <-C; 

17: end if 

18: end for 

19: {yi}l=i ^ SpecClust(C* + 

20: return 

21: end if 

22: end procedure 

23: function Filtration(A, a;,p, L, <5, n) 

24: d D, J -ir- [A], q P, C <— Oix a; 

25: flag 1; 

26: while (d > 1) and (flag = 1) do 


H ^ (Vg|a,)-L, tt ^ ^ -H ^ R 

if (lla^lj - lk(a^)ll)/l|a^ll > ^ then 

11 a = D then 

c(/)^|k(a;')||,V/G[A]; 

end if 

flag 0; 

else 


j ^ |/ g [iV] : 


if \ J\ < L then 
flag 0; 
else 

c{j')^Mx'^)\\,'ij' ej- 
C(j') ^ 0, Vj' G [A] - J- 
if \ J\ < Mn{d) then 
flag 0; 
else 

d ^ d — 1 , a? Tr{x)] 

Xji ^ Tr{xj>)'dj' G J; 

X -(r- {xj! : j' G »/}; 
q VaniSHING(p„(A)); 

end if 
end if 
end if 
end while 
return (c); 
end function 
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efficient than computing the full SVD of Vn{X) via the MATLAB function svd.m, the 
complexity of both functions is of the same order 



(32) 


which is the well-known complexity of SVD [12] adapted to the dimensions of Vn{X). 
This is because svds.m requires at each iteration the solution to a linear system of 
equations whose coefficient matrix has size of the same order as the size of z/„(A’). 

Evidently, the complexity of (32) is prohibitive for large D even for moderate 
values of n. If we discount the spectral clustering step, this is precisely the complexity 
of SASC-A of section 2.6 as well as of SASC-D of section 5.4. On the other hand, 
FSASC (Algorithm 4) is even more computationally demanding, as it requires the 
computation of a vanishing polynomial at each step of every filtration, and there are 
as many filtrations as the total number of points. Assuming for simplicity that there 
is no noise and that the dimensions of all subspaces are equal to d < D, then the 
complexity of a single filtration in FSASC is of the order of 



(33) 


Since FSASC computes a filtration for each and every point, its total complexity 
(discounting the spectral clustering step and assuming that we are using a single 
value for the parameter 7) is 



(34) 


Even though the filtrations are independent of each other, and hence fully paralleliz- 
able, the complexity of FSASC is still prohibitive for large scale applications even after 
parallelization. Nevertheless, when the subspace dimensions are small, then FSASC is 
applicable after one reduces the dimensionality of the data by means of a projection, 
as will be done in section 6.2. At any case, we hope that the complexity issue of 
FSASC will be addressed in future research. 

6. Experiments. In this section we evaluate experimentally the proposed meth¬ 
ods FSASC (Algorithm 4) and SASC-D (section 5.4) and compare them to other 
state-of-the-art subspace clustering methods, using synthetic data (section 6.1), as 
well as real motion segmentation data (section 6.2). 

6.1. Experiments on synthetic data. We begin by randomly generating n = 
3 subspaces of various dimension configurations (di,d2,d3) in K®. The choice D = 9 
for the ambient dimension is motivated by applications in two-view geometry [14, 
43]. Once the subspaces are randomly generated, we use a zero-mean unit-variance 
Gaussian distribution with support on each subspace to randomly sample Ni = 200 
points per subspace. The points of each subspace are then corrupted by additive zero- 
mean Gaussian noise with standard deviation a G {0,0.01,0.03,0.05} and support 
in the orthogonal complement of the subspace. All data points are subsequently 
normalized to have unit euclidean norm. 

Using data as above, we compare the proposed methods FSASC (Algorithm 4) 
and SASC-D (section 5.4) to the state-of-the-art SASC-A (section 2.6) from algebraic 
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Table 1 

Mean subspace clustering error in % over 100 independent trials for synthetic data randomly 
generated in three random subspaces ofM.^ of dimensions (di,d 2 ,ci 3 ). The total number of points 
is N = 600 with 200 points associated to each subspace. We consider noiseless data (a = 0) as 
well as data corrupted by zero-mean additive white noise of standard deviation a and support in the 
orthogonal complement of each subspace. 


method 

(2,3,4) 

(4,5,6) 

00 

(2,5,8) 

(3,3,3) 

( 6 , 6 , 6 ) 

(7,7, 7) 

( 8 , 8 , 8 ) 





(7=0 





FSASC 

0 

0 

0 

0 

0 

0 

0 

0 

SASC-D 

0 

0 

0 

0 

0 

0 

0 

0 

SASC-A 

42 

39 

6 

14 

37 

24 

12 

0 

SSC 

0 

1 

18 

49 

0 

3 

14 

55 

LRR 

0 

3 

39 

5 

0 

9 

42 

51 

LRR-H 

0 

3 

36 

6 

0 

8 

38 

51 

LRSC 

0 

3 

39 

5 

0 

9 

42 

51 

LSR 

0 

3 

39 

5 

0 

9 

42 

51 

LSR-H 

0 

3 

32 

6 

0 

8 

38 

51 





cr = 0.01 





FSASC 

0 

0 

0 

1 

0 

0 

0 

5 

SASC-D 

0 

0 

1 

1 

0 

0 

0 

3 

SASC-A 

54 

45 

8 

24 

57 

36 

13 

3 

SSC 

2 

2 

18 

49 

0 

3 

13 

55 

LRR 

0 

3 

38 

5 

0 

9 

42 

51 

LRR-H 

0 

3 

36 

7 

0 

8 

38 

51 

LRSC 

0 

3 

38 

5 

0 

9 

42 

51 

LSR 

0 

3 

39 

5 

0 

9 

42 

51 

LSR-H 

0 

3 

32 

6 

0 

8 

38 

51 





cr = 0.03 





FSASC 

0 

0 

1 

2 

0 

0 

1 

10 

SASC-D 

0 

0 

4 

3 

0 

1 

2 

6 

SASC-A 

57 

46 

13 

31 

58 

37 

15 

7 

SSC 

0 

1 

20 

48 

0 

3 

13 

55 





cr = 0.05 





FSASC 

1 

0 

2 

3 

1 

0 

2 

14 

SASC-D 

1 

1 

7 

5 

1 

2 

5 

10 

SASC-A 

58 

46 

17 

36 

60 

39 

17 

11 

SSC 

0 

2 

20 

49 

0 

3 

15 

55 

LRR 

1 

3 

39 

6 

0 

10 

42 

51 

LRR-H 

1 

3 

36 

13 

0 

8 

38 

52 

LRSC 

1 

3 

39 

6 

0 

10 

42 

51 

LSR 

1 

3 

39 

6 

0 

10 

42 

51 

LSR-H 

1 

3 

32 

7 

0 

8 

38 

51 


subspace clustering methods, as well as to state-of-the-art self-expressiveness-hased 
methods, such as Sparse Subspace Clustering (SSC) [10], Low-Rank Representation 
(LRR) [20, 22], Low-Rank Subspace Clustering (LRSC) [37] and Least-Squares Re¬ 
gression subspace clustering (LSR) [23]. For FSASC we use L = 10 and 7 = 0.1. 
For SSC we use the Lasso version with az = 20, where az is defined above equation 
(14) in [10], and p = 0.7, where p is the thresholding parameter of the SSC affinity 
(see MATLAB function thrC.m provided by the authors of [10]). For LRR we use 
the ADMM version provided by its first author with A = 4 in equation (7) of [21]. 
For LRSC we use the ADMM method proposed by its authors with r = 420 and 
a = 4000, where a and r are defined at problem (P) of page 2 in [37]. Finally, for 
LSR we use equation (16) in [23] with A = 0.0048. For both LRR and LSR we also 
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Table 2 

Mean intra-cluster connectivity over 100 independent trials for synthetic data randomly gener¬ 
ated in three random subspaces o/R^ of dimensions {di,d 2 ,d^). There are 200 points associated to 
each subspace, which are corrupted by zero-mean additive white noise of standard deviation a and 
support in the orthogonal complement of each subspace. 


method 

(2,3,4) 

(4,5,6) 

(6,7,8) 

(2,5,8) 

(3,3,3) 

(6,6,6) 

(7,7, 7) 

(8,8,8) 





a = 0 





FSASC 

1 

1 

1 

1 

1 

1 

1 

1 

SASC-D 

1 

1 

1 

1 

1 

1 

1 

1 

SASC-A 

0.37 

0.37 

0.37 

0.39 

0.34 

0.41 

0.37 

1 

SSC 

10-3 

0.01 

10-4 

10-3 

0.01 

0.02 

10-3 

10-7 

LRR 

0.59 

0.37 

0.43 

0.31 

0.64 

0.41 

0.45 

0.50 

LRR-H 

0.28 

0.23 

0.23 

0.19 

0.31 

0.24 

0.24 

0.26 

LRSC 

0.59 

0.37 

0.43 

0.31 

0.64 

0.41 

0.45 

0.50 

LSR 

0.59 

0.37 

0.42 

0.31 

0.64 

0.41 

0.45 

0.50 

LSR-H 

0.28 

0.24 

0.24 

0.21 

0.31 

0.25 

0.25 

0.27 





cr = 0.01 





FSASC 

0.05 

0.35 

0.43 

0.10 

0.09 

0.43 

0.42 

0.43 

SASC-D 

0.91 

0.93 

0.85 

0.84 

0.94 

0.91 

0.87 

0.85 

SASC-A 

0.32 

0.30 

0.12 

0.14 

0.30 

0.29 

0.24 

0.07 

SSC 

10-3 

0.01 

10-4 

10-3 

0.01 

0.02 

10-3 

10-7 

LRR 

0.42 

0.37 

0.43 

0.31 

0.51 

0.41 

0.45 

0.50 

LRR-H 

0.13 

0.23 

0.23 

0.17 

0.22 

0.24 

0.24 

0.26 

LRSC 

0.42 

0.37 

0.43 

0.31 

0.52 

0.41 

0.45 

0.50 

LSR 

0.41 

0.37 

0.42 

0.31 

0.51 

0.41 

0.45 

0.50 

LSR-H 

0.11 

0.24 

0.24 

0.18 

0.21 

0.25 

0.25 

0.27 


Table 3 

Mean inter-cluster connectivity in % over 100 independent trials for synthetic data randomly 
generated in three random subspaces o/R^ of dimensions (di, ^ 2 , dz). There are 200 points associated 
to each subspace, which are corrupted by zero-mean additive white noise of standard deviation cr and 
support in the orthogonal complement of each subspace. 


method 

(2,3,4) 

(4,5,6) 

(6,7,8) 

(2,5,8) 

(3,3,3) 

(6,6,6) 

(7,7, 7) 

oo 

00 





(7=0 





FSASC 

0 

0 

1 

1 

0 

0 

0 

2 

SASC-D 

60 

60 

60 

60 

60 

60 

60 

60 

SASC-A 

55 

55 

38 

43 

55 

50 

42 

35 

SSC 

0 

2 

22 

2 

0 

7 

23 

46 

LRR 

1 

49 

60 

45 

0 

55 

60 

63 

LRR-H 

0 

18 

43 

9 

0 

32 

44 

55 

LRSC 

2 

49 

60 

45 

2 

55 

60 

63 

LSR 

2 

49 

60 

43 

2 

56 

60 

64 

LSR-H 

0 

11 

24 

6 

0 

19 

25 

30 





cr = 0.01 





FSASC 

2 

4 

22 

18 

2 

6 

15 

35 

SASC-D 

62 

61 

60 

61 

62 

60 

60 

60 

SASC-A 

63 

58 

46 

51 

64 

55 

47 

39 

SSC 

0.1 

1 

23 

3 

0.1 

7 

23 

46 

LRR 

17 

49 

60 

45 

16 

55 

60 

63 

LRR-H 

1 

18 

43 

9 

1 

32 

44 

55 

LRSC 

17 

49 

60 

45 

16 

55 

60 

63 

LSR 

17 

49 

60 

46 

16 

55 

60 

64 

LSR-H 

0.1 

11 

24 

6 

0.1 

19 

25 

30 
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Table 4 

Mean running time of each method in seconds over 100 independent trials for synthetic data 
randomly generated in three random subspaces o/M® of dimensions {di, d 2 , d^). There are 200 points 
associated to each subspace, which are corrupted by zero-mean additive white noise of standard 
deviation a = 0.01 and support in the orthogonal complement of each subspace. The reported 
running time is the time required to compute the affinity matrix, and it does not include the spectral 
clustering step. The experiment is run in MATLAB on a standard Macbook-Pro with a dual core 
2.5GHz Processor and a total of AGB Cache memory. 


method 

(2,3,4) 

(4,5,6) 

(6,7,8) 

(2,5,8) 

(3,3,3) 

( 6 , 6 , 6 ) 

(7,7, 7) 

jX) 

FSASC 

13.57 

12.11 

8.34 

cr = 0.01 

13.90 

13.69 

10.67 

8.55 

6.01 

SASC-D 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

SASC-A 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

0.03 

SSC 

5.01 

4.84 

5.06 

6.59 

4.90 

4.71 

4.80 

5.03 

LRR 

0.54 

0.36 

0.34 

0.45 

0.53 

0.34 

0.34 

0.34 

LRR-H 

0.65 

0.48 

0.45 

0.61 

0.65 

0.46 

0.46 

0.45 

LRSC 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

LSR 

0.05 

0.05 

0.05 

0.07 

0.05 

0.05 

0.05 

0.05 

LSR-H 

0.25 

0.25 

0.24 

0.32 

0.24 

0.24 

0.24 

0.24 


report results with the heuristic post-processing of the affinity matrix proposed by the 
first author of [21] in their MATLAB function lrr_motion_seg .m; we denote these 
versions of LRR and LSR by LRR-H and LSR-H respectively. 

Notice that all compared methods are spectral methods, i.e., they produce a pair¬ 
wise affinity matrix C upon which spectral clustering is applied. To evaluate the 
quality of the produced affinity, besides reporting the standard subspace clustering 
error, which is the percentage of misclassified points, we also report the intra-cluster 
and inter-cluster connectivities of the affinity matrices C. As an intra-cluster connec¬ 
tivity we use the minimum algebraic connectivity among the subgraphs corresponding 
to the ground truth clusters. The algebraic connectivity of a subgraph is the second 
smallest eigenvalue of its normalized Laplacian, and measures how well connected 
the graph is. In particular, values close to 1 indicate that the subgraph is indeed 
well-connected (single connected component), while values close to 0 indicate that 
the subgraph tends to split to at least two connected components. Clearly, from 
a clustering point of view, the latter situation is undesirable, since it may lead to 
over-segmentation. Finally, as inter-cluster connectivity we use the percentage of the 
£i-norm of the affinity matrix C that corresponds to erroneous connections, i.e., the 
quantity gS x ,gS-i i/i' l/l|C'||i. The smaller the inter-cluster connectivity 
is, the fewer erroneous connections the affinity contains. To summarize, a high-quality 
affinity matrix is characterized by high intra-cluster and low inter-cluster connectivity, 
which is then expected to lead to small spectral clustering error. 

Tables 1-3 show the clustering error, and the intra-cluster and inter-cluster con¬ 
nectivities associated with each method, averaged over 100 independent experiments. 
Inspection of Table 1 reveals that, in the absence of noise {a = 0), FSASC gives 
exactly zero error across all dimension configurations. This is in agreement with the 
theoretical results of section 4, which guarantee that, in the absence of noise, the only 
points that survive the filtration associated with some reference point are precisely 
the points lying in the same subspace as the reference point. Indeed, notice that 
in Table 2 and for cr = 0 the connectivity attains its maximum value 1, indicating 
that the subgraphs corresponding to the ground truth clusters are fully connected. 
Moreover in Table 3 we see that for tr = 0 the erroneous connections are either zero 
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or negligible. This practically means that each point is connected to each and every 
other point from same subspace, while not connected to any other points, which is 
the ideal structure that an affinity matrix should have. 

Remarkably, the proposed SASC-D, which is much simpler than FSASC, also 
gives zero error for zero noise. Table 2 shows that SASC-D achieves perfect intra¬ 
cluster connectivity, while Table 3 shows that the inter-cluster connectivity associated 
with SASC-D is very large. This is clearly an undesirable feature, which nevertheless 
seems not to be affecting the clustering error in this experiment, perhaps because the 
intra-cluster connectivity is very high. As we will see though later (section 6.2), the 
situation is different for real data, for which SASC-D performs inferior to FSASC. 

Going back to Table 1 and cr = 0, we see that the improvement in performance 
of the proposed FSASC and SASC-D over the existing SASC-A is dramatic: indeed, 
SASC-A succeeds only in the case of hyperplanes, i.e., when di = d 2 = ds = 8. This 
is theoretically expected, since in the case of hyperplanes there is only one normal 
direction per subspace, and the gradient of the vanishing polynomial at a point in the 
hyperplane is guaranteed to recover this direction. Flowever, when the subspaces have 
lower-dimensions, as is the case, e.g., for the dimension configuration (4,5,6), then 
there are infinitely many orthogonal directions to each subspace. Hence a priori, the 
gradient of a vanishing polynomial may recover any such direction, and such directions 
could be dramatically different even for points in the same subspace (e.g., they could 
be orthogonal), thus leading to a clustering error of 39%. 

As far as the rest of the self-expressiveness methods are concerned. Table 1 (cr = 0) 
shows what we expect: the methods give a perfect clustering when the subspace 
dimensions are small, e.g., for dimension configurations (2, 3,4) and (3, 3,3), they start 
to degrade as the subspace dimensions increase ((4, 5,6), (6, 6,6)), and eventually they 
fail when the subspace dimensions become large enough ((6, 7, 8),(7, 7, 7),(8,8, 8)). To 
examine the effect of the subspace dimension on the connectivity, let us consider 
SSC and the dimension configurations (2,3,4) and (2,5,8): Table 2 (cr = 0) shows 
that for both of these configurations the intra-cluster connectivity has a small value 
of 10“^. This is expected, since SSC computes sparse affinities and it is known to 
produce weakly connected clusters. Now, Table 3 (cr = 0) shows that the inter-cluster 
connectivity of SSC for (2,3,4) is zero, i.e., there are no erroneous connections, and 
so, even though the intra-cluster connectivity is as small as 10“^, spectral clustering 
can still give a zero clustering error. On the other hand, for the case (2, 5,8) the inter¬ 
cluster connectivity is 2%, which, even though small, when coupled with the small 
intra-cluster connectivity of 10“^, leads to a spectral clustering error of 49%. Finally, 
notice that for the case of (8,8, 8) the intra-cluster connectivity is 10“^ and the inter¬ 
cluster connectivity is 46%, indicating that the quality of the produced affinity is very 
poor, thus explaining the corresponding clustering error of 55%. 

When the data are corrupted by noise (cr = 0.01,0.03,0.05), the rest of the 
Tables 1-3 show that FSASC is the best method, with the exception of the case 
of hyperplanes. In this latter case, i.e., when di = d^ = = 8, the best method is 

SASC-D with a clustering error of 6% when a = 0.03, as opposed to 10% for FSASC. 
This is expected, since for the case of codimension-1 subspaces the length of each 
filtration should be precisely 1, since in theory, the length of the filtration is equal 
to the codimension of the subspace associated to the reference point. Since FSASC 
automatically determines this length based on the data and the value of the parameter 
7 , it is expected that when the data are noisy, errors will be made in the estimation 
of the filtration length. On the other hand, SASC-D is equivalent to FSASC with an 
a priori configured filtration length equal to 1, thus performing superior to FSASC. 
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Table 5 

Mean subspace clustering error in % over 100 independent trials for synthetic data randomly 
generated in four random subspaces o/R® of dimensions (8, 8, 5,3). There are 200 points associated 
to each subspace, which are corrupted by zero-mean additive white noise of standard deviation a = 
0,0.01,0.03,0.05 and support in the orthogonal complement of each subspace. 


method / a 

0 

0.01 

0.03 

0.05 

FSASC 

0 

2.19 

5.08 

7.65 

SASC-D 

22.88 

17.83 

15.93 

17.44 

SASC-A 

22.88 

27.21 

31.43 

36.36 

SSC 

64.39 

64.17 

64.36 

64.13 

LRR 

42.86 

42.88 

43.04 

42.91 

LRR-H 

42.08 

42.06 

42.23 

42.21 

LRSC 

42.85 

42.88 

43.05 

42.90 

LSR 

42.84 

42.85 

43.00 

42.93 

LSR-H 

38.72 

38.74 

38.96 

39.86 


Certainly, giving as inpnt to FSASC more than one values for 7, as shown in Algorithm 
4 , is expected to address this issue, but also increase the running time of FSASC (see 
Table 4 for average running times of the methods in the current experiment). 

We conclude this section by demonstrating the interesting property of FSASC 
of being able to give the correct clustering by using vanishing polynomials of degree 
strictly less than the true number of subspaces. Towards that end, we consider a 
similar situation as above, except that now we have n = 4 subspaces of dimensions 
( 8 , 8 , 5, 3). Contrary to SASC-D and SASC-A, for which the theory requires degree-4 
polynomials, FSASC is still applicable if one works with polynomials of degree 3: the 
crucial observation is that for the dimension configuration ( 8 , 8 , 5, 3), the correspond¬ 
ing subspace arrangement always admits vanishing polynomials of degree 3, and the 
same is true for every intermediate arrangement occurring in a filtration. For exam¬ 
ple, if one lets 6 i be a normal vector to one of the 8 -dimensional subspaces, and 62 
a normal vector to the other, and 63 a normal vector to the 8 -dimensional subspace 
spanned by both the 5-dimensional and 3-dimensional subspace, then the polynomial 
p{x) = {bjx){bjx){bjx) has degree 3 and vanishes on the entire arrangement of the 
four subspaces. Interestingly, Table 5 shows that FSASC gives zero error in the ab¬ 
sence of noise and 7.65% error for the worst case a — 0.05, while all other methods 
fail. In particular, the other two algebraic methods, i.e., SASC-D and SASC-A, are 
not able to cluster the data using a single vanishing polynomial of degree 3. 

6.2. Experiments on real motion sequences. We evaluate different methods 
on the Hopkinsl55 motion segmentation data set [31], which contains 155 videos of 
n = 2, 3 moving objects, each one with N = 100-500 feature point trajectories of 
dimension D = 56-80. While SSC, LRR, LRSC and LSR can operate directly on the 
raw data, algebraic methods require A4„(I?) < N. Hence, for algebraic methods, we 
project the raw data onto the subspace spanned by their D principal components, 
where D is the largest integer < 8 such that Ain{D) < N, and then normalize each 
point to have unit norm. We apply SSC to i) the raw data (SSC-raw) and ii) the raw 
points projected onto their first 8 principal components and normalized to unit norm 
(SSC-proj). For FSASC we use L = 10 and 7 = 0.001,0.005,0.01,0.05,0.1,0.5,1,5,10. 
LRR, LRSC and LSR use the same parameters as in section 6.1, while for SSC the 
parameters are a = 800 and p = 0.7. 

The clustering errors and the intra/inter-cluster connectivities are reported in 
Table 6 and Fig. 4. Notice the clustering errors of about 5% and 37% for SASC- 
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Table 6 

Mean clustering error (E) in %, intra-cluster connectivity (Ci), and inter-cluster connectivity 
(C 2 ) in % for the HopkinslSS data set. 


method 

2 motions 

3 motions 

all motions 

E 

Cl 

C2 

E 

Cl 

C2 

E 

Cl 

C2 

FSASC 

0.80 

0.18 

4 

2.48 

0.10 

10 

1.18 

0.16 

5 

SASC-D 

5.65 

0.82 

26 

14.0 

0.80 

46 

7.59 

0.81 

31 

SASC-A 

4.99 

0.35 

5 

36.8 

0.09 

35 

12.2 

0.29 

12 

SSC-raw 

1.53 

0.05 

2 

4.40 

0.04 

3 

2.18 

0.05 

2 

SSC-proj 

5.87 

0.04 

3 

5.70 

0.03 

3 

5.83 

0.03 

3 

LRR 

4.26 

0.25 

19 

7.78 

0.25 

28 

5.05 

0.25 

21 

LRR-H 

2.25 

0.05 

2 

3.40 

0.04 

3 

2.51 

0.05 

2 

LRSC 

3.38 

0.25 

19 

7.42 

0.24 

28 

4.29 

0.25 

21 

LSR 

3.60 

0.24 

18 

7.77 

0.23 

28 

4.54 

0.23 

21 

LSR-H 

2.73 

0.04 

1 

2.60 

0.03 

2 

2.70 

0.04 

1 


■e— FSASC(1.18%) 



sequence index 


Fig. 4. Clustering error ratios for both 2 and 3 motions in Hopkinsl55, ordered increasingly 
for each method. Errors start from the 90-th smallest error of each method. 


A for two and three motions respectively. Notice how changing the angle-based by 
the distance-based affinity, SASC-D already gives errors of around 5.5% and 14%. 
But most dramatically, notice how FSASC further reduces those errors to 0.8% and 
2.48%. Moreover, even though the dimensions of the subspaces {di € {1,2,3,4} for 
motion segmentation) are low relative to the ambient space dimension {D = 56- 
80) - a case that is specifically suited for SSC, LRR, LRSC, LSR - projecting the 
data to H < 8, which makes the subspace dimensions comparable to the ambient 
dimension, is sufficient for FSASC to get superior performance relative to the best 
performing algorithms on Hopkins 155. We believe that this is because, overall, 
FSASC produces a much higher intra-cluster connectivity, without increasing the 
inter-cluster connectivity too much. 

7. Conclusions and Future Research. We presented a novel family of sub¬ 
space clustering algorithms, termed Filtrated Algebraic Subspace Clustering (FASC). 
The common theme of these algorithms is the notion of a filtration of subspace ar¬ 
rangements. The first algorithm of the family, termed Filtrated Algebraie Subspace 
Clustering (FASC) receives as input a finite point set in general position inside a 
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subspace arrangement, together with an upper bound on the number of subspaces in 
the arrangement. Then FASC provably returns the number of the subspaces, their 
dimensions, as well as a basis for the orthogonal complement of each subspace. The 
second algorithm of the family, termed Filtrated Spectral Algebraic Subspace Cluster¬ 
ing (FSASC) is an adaptation of FASC to a working algorithm that is robust to noise. 
In fact, by experiments on synthetic and real data we showed that FSASC is superior 
to state-of-the-art subspace clustering algorithms on several occasions. 

Due to the power of the machinery of filtrations, FSASC is unique among other 
subspace clustering algorithms in that it can handle robustly subspaces of potentially 
very different dimensions, which can be arbitrarily close or far from the dimension of 
the ambient space. This is an important distinctive feature of FSASC from state-of- 
the-art Sparse and Low-Rank methods, which are in principle applicable only when the 
subspace dimensions are sufficiently small relative to the ambient dimension. However, 
this advantage of FSASC comes at the cost of a large computational complexity. 
Future research will address the problem of reducing this complexity with the aim 
of making FSASC applicable to large scale datasets. Additional challenges to be 
undertaken include making FSASC robust to missing entries and outliers. 

Appendix A. Notions From Commntative Algebra. A central concept in 
the theory of polynomial algebra is that of an ideal: 

Definition 29 (Ideal). A subset I of the ring M .[ x ] := R[a;i,..., of polyno¬ 
mials is called an ideal if for every p,q € I and every r S ]R[a;] we have that p-\-q 
and rp G I. If pi,... ,pn are elements 0 /]R[x], then the ideal generated by these 
elements is the set of all linear combinations of the pi with coefficients in R[x]. 

A polynomial / € M[a;] is called homogeneous of degree r, if all the monomials 
that appear in / have degree r. An ideal I is called homogeneous, if it is generated by 
homogeneous elements, i.e., I = (/i ,..., fs) where fi is a homogeneous polynomial 
of degree ri. The reader can check that an ideal I is homogeneous if and only if 
21 = ©fe>o21fc, where Ik = mR[a;]fc. It is not hard to see that the intersection and the 
sum of two (homogeneous) ideals is a (homogeneous) ideal. In performing algebraic 
operations with ideals it is also useful to have a notion of product of ideals: 

Definition 30 (Product of ideals). Let Ii,l 2 be ideals o/IR[a;]. The product 
I 1 I 2 of Ii,X 2 is defined to be the set of all elements of the form piqi -!-••• -\- Pmqm 
for any m G N,pi G Ii, qi G I 2 • 

The notion of a prime ideal is a natural generalization of the notion of a prime number. 
Prime ideals play a fundamental role in the study of the structure of general ideals, 
in analogy to the role that prime numbers have in the structure of integers. 

Definition 31 (Prime ideal). An ideal p of R[a;] is called prime, if whenever 
pq G p for some p,q G R[a;], then either p G p or q G p. 

We note that if p is a homogeneous ideal, then in order to check whether p is prime, 
it is enough to consider /, g homogeneous polynomials in the above definition. 

Proposition 32. Let p,Ii, ...,In be ideals o/IR[a;] with p being prime. If p D 
Ii n • • • n then p D li for some i G [n]. 

Proof. Suppose p fi li for all i. Then for every i there exists Xi G C — p. But 
then ni=i ^ P since p is prime, some xj G p, contradiction. □ 

A final notion that we need is that of a radical ideal: 
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Definition 33. An ideal T o/]R[a;] is called radical, if whenever some p G K[a;] 
satisfies p^ G 2 for some i, then it must be the case that p G 2. 

Radical ideals have a very nice structure: 

Theorem 34. Every radical ideal 2 of ]R[a:] can be written uniquely as the finite 
intersection of prime ideals. Conversely, the intersection of a finite number of prime 
ideals is always a radical ideal. 

For further information on commutative algebra we refer the reader to [1] and [7] or 
to the more advanced treatment of [26]. 

Appendix B. Notions Prom Algebraic Geometry. The central object of 
algebraic geometry is that of an algebraic variety. 

Definition 35 (Algebraic variety). A subset y of is called an algebraic 
variety or algebraic set if it is the zero-locus of some ideal a of ]R[a;], i.e., y = 
{y G M'® : p(y) = 0, Vy G a}. A standard notation is to write y = Z { a ) where the 
operator Z{-) denotes zero set. 

If = Z ( a ) is an algebraic variety, then certainly every polynomial of a vanishes 
on the entire y (by definition). However, there may be more polynomials with that 
property, and they have a special name: 

Definition 36 (Vanishing ideal). The vanishing ideal of a subset y ofM.^, de¬ 
noted 2y, is the set of all polynomials o/R.[a:] that vanish on every point of y, i.e., 
2y = {pG K[a:] : p{y) = 0, Vy G V}. 

It can be shown that the algebraic varieties induce a topology on R.^: 

Definition 37 (Zariski topology). The Zariski Topology on R^ is the topology 
generated by defining the closed sets to be all the algebraic varieties. 

Applying the definition of an irreducible topological space in the context of the Zariski 
topology, we obtain: 

Definition 38 (Irreducible algebraic variety). An algebraic variety y is called 
irreducible if it can not be written as the union of two proper subsets of y that are 
closed in the subspace topology ofy.^^ 

The following Theorem is one of many interesting connections between geometry and 
algebra: 

Theorem 39. An algebraic variety y = Z { a ) is irreducible if and only if its 
vanishing ideal 2y is prime. 

Perhaps not surprisingly, irreducible varieties are the fundamental building blocks of 
general varieties: 

Theorem 40 (Irreducible decomposition). Every algebraic variety y of R^ can 
be uniquely written as V = Vi U • • • U Vn, where yi are irreducible varieties and there 
are no inclusions Vi C Vj fori ^ j. The varieties Vi are referred to as the irreducible 
components ofy. 

Proposition 41. //Vi = -Z(ai),V 2 = - 2 ( 02 ) are algebraic varieties such that 
Oi C a 2 , then Vi A V 2 - 


note that certain authors (e.g. [15]) reserve the term algebraic variety to refer to an irre- 
ducible closed set 
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Theorem 42. It two subsets o/ satisfy the inclusion 3^i Z) 3^27 

their vanishing ideals will satisfy the reverse inclusion ly-^ C Xy^ . 

Proposition 43. Let yi = Z(oi),3^2 = 2(a2) be varieties of MP. Then Pi fl 
3^2 = -Z(ai + 02 ). 

The final theorem that we present characterizes the set of all points that arise as the 
zero set of the vanishing ideal of an arbitrary subset y of . 

Proposition 44. Let y be a subset of R^ and ly its vanishing ideal. Then 
Z{Xy) = where is the topological closure ofY in the Zariski topology. 

Finally, it should be noted that most of classic and modern algebraic geometry [15] 
assume that the underlying algebraic field (in this paper R) is algebraically closed 
[18]. An example of an algebraically closed field is the complex numbers C. Conse¬ 
quently, one should be careful when using results such as Hilbert’s Nullstellensatz in 
real polynomial rings. 


Appendix C. Subspace Arrangements and their Vanishing Ideals. We 

begin by defining the main mathematical object of interest in this paper. 

Definition 45 (Subspace arrangement). A union A = Ur=i'^* sub¬ 

spaces Si,... ,Sn o/R^, with D > l,n > 1 is called a subspace arrangement. 

It is often technically convenient to work with subspace arrangements that are as 
general as possible. One way to capture this notion is by the following definition. 

Definition 46 (Transversal subspace arrangement [5]). A subspace arrange¬ 
ment A= ur=i C R^ is called transversal, if for any subset 3 of [n], the codimen¬ 
sion of Higa ^^6 minimum between D and the sum of the codimensions of all 

Si, i €: J, i.e., 

(35) codim ( <5^ j = min < D, ^ a > , 

Visa / I iea J 

where Ci = codim 5^. 


Transversality is a geometric condition on the subspaces <Si,... ,<S„, that requires all 
possible intersections among the subspaces to be as small as possible, as allowed by 
the dimensions of the subspaces. To see this, let 3 be a subset of [n], which without 
loss of generality can be taken to be 3 = {1,2,...,.^} = [1], where £ < n. For 
every i € 3 let Bi he a D x Ci matrix, whose columns form a basis for 5^^, where 
Ci = codim Si := D — dimiSi, and let B = [Bi... B(\. Then the intersection Higa 
can be described algebraically as 

(36) a; e Pi o B^x = 0. 

From (36) it is clear that the dimension of Higa is equal to the dimension of the 
right nullspace of B, or equivalently 


(37) 


codim 



rank(B). 


Now, B is a D X (X^iga Ci) matrix and so its rank will satisfy 


(38) 


rank(B) < min 
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which in conjunction with (37) justifies the geometric interpretation of Definition 4. 
In fact, if A is not transversal, then there exists some subset 3 C [n], for which S is 
rank-deficient, which shows that certain algebraic relations must be satisfied among 
the parametrizations Bi, ..., of the subspaces iSi, ..., This is essentially the 
argument behind the proof of the next Proposition, which shows that transversality 
is not a strong condition, rather it will be satisfied almost surely. 

Proposition 47. Let A be a subspace arrangement consisting of n linear sub¬ 
spaces of K.^ of dimensions di ,..., d„. If A is chosen uniformly at random, then A 
will be transversal with probability 1. 

Example 48. An arrangement ^ = 5i U <S 2 U C K.^ such that tSi C S 2 is 
non-transversal, since codimiSi ("152= codimiSi = ci < min{D, ci -I-C 2 }. Note that 
when choosing iSi, 52, vSa uniformly at random, the event Si C S 2 has probability zero. 

Example 49. An arrangement of three planes A = 'HiL)'H 2 ^'hL 3 ofM.^ that inter¬ 
sect on a line is non-transversal, because codim7^107^2(3773 = 2 < min {3,1 -I- 1 -b 1}. 
When 77 i, 772,773 are chosen uniformly at random, which is equivalent to choosing 
their normal vectors 61 , ^ 2 ; ^3 uniformly at random, the three planes intersect on a 
line only */ 61 , 62 , 63 are linearly dependent, which is a probability zero event. 

Another notion of subspace arrangements in general position that is closely related 
to transversal arrangements, is that of linearly general subspaces. 

Definition 50 (Linearly general subspace arrangement [4]). A subspace arrange¬ 
ment ^= ur=i Si is called linearly general, if for every subset 3 C [n] we have 

(39) dim =minJ I , 

Viea / I iea J 


where di = dim Si. 

As the reader may suspect, the notion of transversal and linearly general are dual to 
each other in the following sense. 

Proposition 51. A subspace arrangement ljr=i t^nsversal if and only if 
the subspace arrangement Ur=i linearly general. 

Proof. This follows by noting that with reference to the matrix B constructed 
below Definition 4, we have 


(40) 


codim 



rank(B) = dim 



and that codim 5^ = dim5j^. □ 

In order to understand some important properties of subspace arrangements, it is 
necessary to examine the algebraic-geometric properties of a single subspace S of 
of dimension d. Let bi,..., be be a basis for the orthogonal complement of S, where 
c = D — d and define the polynomials Pi{x) = bj x, i = 1,... ,c. Notice that Pi{x) 
is homogeneous of degree 1 and is thus also referred to as linear form. If a point x 
belongs to S, then Pi{x) = 0, VL Conversely, if a point x g satisfies Pi{x) = 0, Vi, 
then X G S. This shows that S — Z{pi,... ,Pc), i.e., S is an algebraic variety. Notice 
that the set of linear forms that vanish on 5 is a vector space and the polynomials 
Pi, i = 1,... ,c form a basis. 
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The Proposition that follows asserts that the vanishing ideal of S, i.e., the set of 
all polynomials that vanish at every point of <S, is in fact generated by the polynomials 
Pi{x), i = l,...,c. 

Proposition 52 (Vanishing Ideal of a Subspace). Let S = Span( 6 i ,... ,bc)^ 
be a subspace of defined as the orthogonal complement of the space spanned by 
{bi,..., be} over R. Then Is is generated over K[a;] by the linear forms bjx,..., bjx. 

Proof. Let {bi,... ,bc} be a basis for the orthogonal complement of S and aug¬ 
ment it to a basis {bi ,... ,bc,hi,..., hjy-c} of where hi,, ho-c is a basis for 
S. Now define a change of basis transformation (p : R-^ —^ R-^, which maps the basis 
{bi ,... ,bc, hi,..., hjy-c} to the canonical basis {ei,..., ed} of R^, where is the 
z-th column of the D x D identity matrix. Notice that bi is mapped to and as 
a consequence S is mapped to the orthogonal complement of the vectors ei,... ,ec. 
Since is a vector space isomorphism, we do not loose generality if we assume from 
the beginning that S = Span(ei,..., Ec)^ = Span(ec+i,..., ed) and the vector space 
of linear forms that vanish on 5 is a:i,... ,Xc. Notice that a; G 5 if and only if the 
first c coordinates of x are zero. 

Now let g G Is. We can write g{x) = g{xc+i,. ■. ,Xd) + ^i=i Xigi{x). By 
hypothesis we have g{0 ,..., 0 ,Oc+i..., od) = 0 for any real numbers Uc+i ,... ,aD, 
which implies that g{ac+i, ■. ■, an) = 0 , Voc+i ,... ,aD G R. This in turn implies that 
g is the zero polynomial Hence g{x) = Xigi{x), which shows that g is inside 
the ideal generated by the linear forms that vanish on S. □ 

In algebraic-geometric notation, the above proposition can be concisely stated as 
= {bl X,..., bJx). Interestingly, the vanishing ideal of a subspace is a 

prime ideal: 

Proposition 53. Lets be a subspace o/R^. ThenS is irreducible in the Zariski 
topology o/R^ or equivalently, Is is a prime ideal o/R[a:]. 

Proof. As in the proof of Proposition 52 we can assume that {xi ,..., Xc) is a basis 
for the linear forms of R[a;] that vanish on S. Then Is = {xi,..., Xc) and our task is 
to show that Is is prime. So let /, g be homogeneous polynomials such that fg G Is 
and suppose that f ^ Is. We will show that g G Is. We can write / = fi + f^, 
where fi, /2 are polynomials such that fi G Is and /2 G R[a:c+i ,... ,xd]. Similarly 
g = gi + g 2 , with gi G Is and 52 G R[a;c+i,... ,Xd\. Since by hypothesis / ^ Is, it 
must be the case that /2 0. To show that g G Is, it is enough to show that 52 = 0. 

Towards that end, notice that fg = {fgi + fig 2 ) + / 252 , where fgi + fig^ G Is. 
Since by hypothesis fg G Is, we also have that /232 G Is. This means that there exist 
polynomials hi,... ,hc G R[a;i,..., xd], such that f 2 g 2 = xihi -I- • • • -I- Xchc. However, 
none of the variables xi,... ,Xc appear on the left hand side of this equation, and 
so this equation is true only when both sides are equal to zero. Since by hypothesis 
/2 7 ^ 0, this implies that (I 2 = 0, and so g G Is. 

Alternative Proof: A more direct proof exists if we assume familiarity of the reader 
with quotient rings. In particular, it is known that an ideal I of a commutative ring R 
is prime if and only if the quotient ring R/I has no zero-divisors [1]. By noticing that 
R[a:i,..., Xd]/{ xi,..., Xc) = R[a;c+i,..., xd] we immediately see that {xi,..., Xc) is 
prime. □ 

Returning to the subspace arrangements, we see that a subspace arrangement 

We can prove by induction on d that if F is an infinite field and g{xi,... ,x,i) = 0, Vxi , ■ ■ ■ , Xd G 
F, then g = 0. 
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^ = iSi U ■ • • U iSn is the union of irreducible algebraic varieties Si,... ,Sn. This 
immediately suggests that the subspace arrangment itself is an algebraic variety. This 
was established in [25] via an alternative argument. Additionally, in view of Theorem 
40, the irreducible components of A are precisely its constituent subspaces 5i,..., iS„, 
which also proves that a subspace arrangement can be uniquely written as the union 
of subspaces among which there are no inclusions. We summarize these observations 
in the following theorem: 

Theorem 54. Let Si,... ,Sn be subspaces o/ such that no inclusions exist 
between any two subspaces. Then the arrangement = iSi U • • • U is an algebraic 
variety and its irreducible components are Si,... ,Sn. 

The vanishing ideal of a subspace arrangement A = Ur=i readily seen to 
relate to the vanishing ideals of its irreducible components via the formula 

(41) Ta = Tsi n • ■ • n Isn ■ 

Since Isi is a prime ideal, Theorem 34 implies that Ia is radical and that A uniquely 
determines the ideals ,. •., , assuming that there are no inclusions between the 

subspaces. Hence, retrieving the irreducible components of a subspace arrangement 
is equivalent to computing the prime factors of its vanishing ideal Ia- 

Since the ideal of a single subspace 5i is generated by linear forms, i.e., it is 
generated in degree 1, one may be tempted to conjecture that the ideal Ia of a union 
of n subspaces is generated in degree less or equal than n. In fact, this is true: 

Proposition 55. Let A be an arrangement of n linear subspaces o/M^. Then 
its vanishing ideal I a is generated in degree < n. 

Proof. By [6] the Castelnuovo-Mumford regularity‘s^ of Ia is bounded above by 
n. But by definition, the CM-regularity of an ideal bounds from above the maximal 
degree of a generator of the ideal. □ 

A crucial property of a subspace arrangement A in relation to the theory of 
Algebraic Subspace Clustering is that for any non-zero vanishing polynomial p on A, 
the orthogonal complement of the space spanned by the gradient of p at some point 
X £ A contains the subspace to which x belongs. 

Proposition 56. Let A = ur=i‘^» ® subspace arrangement o/R.^, p € Ia 

and X G A, say x G Si for some i G [n]. Then Vplx T Si. 

Proof. Take p G I a- From I a = Isx H • ■ • fl 1$^ we have that I a C Isi- Hence 
p G Isi ■ Now, from Proposition 52 we know that Isi is generated by a basis among 
all linear forms that vanish on Si, i.e., by a basis of Isi,i- If {bn-: ■ ■ ■ -ibia) is an 
R-basis for then ..., bj^.x) is an R-basis for Isi,i and a set of generators 

for Isi over R[x]. Hence we can write p{x) = J2'j=iibjjx)gj{x) where gj{x) G R[x]. 
Taking the gradient of both sides of the above equation we get Vp = J2j'=i 9j{^)bij + 
J2'j=iibJjX)'^gj . Now let a; S be any point of Si. Evaluating both sides at x we 
have 9jix)bij + J2‘jLiibJjX)Vgj\x. By hypothesis we have bj^x = 0, Vj 

and so we obtain \7p\x = J2j=i 9ji^)bij G S^. □ 

One may wonder when it is the case that the gradient of a vanishing polynomial 
on a subspace arrangement A is zero at every point of A. This is answered by 


Please see [7], [4], [6] or [5] for the definition of Castelnuovo-Mumford regularity. 
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Proposition 57. Let A = Ur=i ® subspace arrangement of and let 

p GIa- Then Vp\x = 0,yx G A if and only if p G ■ 

Proof. (=>) Suppose that p G Ia, such that Vp\x = 0,Vai G A. Since Xa C 
ISi, V* G [n], by Proposition 52, p(x) can be written as 

Ci 

(42) p(x) = '^gij{x)(bjjx), 

where Ci is the codimension of Si, (bi^, ■ ■ ■ ,bi^ci) is a basis for S^ and gi,j{x) are 
polynomials. Now the hypothesis Vp\x = 0,\/x G A implies that dp/dxk\x = 0,'^x G 
A, Vfc G [D], Thus dp/dxk G Xa and so dp/dxk G Xs^. Hence, again by Proposition 
52, dp/dxk can be written as 

Cj 

(43) dpjdxk = ^ hij^k{x){bjjx). 

i=i 

Differentiating equation (42) with respect to xt gives 

Ci Ci 

(44) dp/dxk = ^ {dgij/dxk) {bj^x) + ^ g,^j{x)bi^j{k). 

i=i i=i 

From equations (43), (44) we obtain 

Ci Ci 

(45) g,^j{x)hj{k) = Y ihiAx) - dgijfdxk) {bjjx) 

j=i i=i 

which can equivalently be written as 

Ci Ci 

(46) Y Kjik)gi,j{x) = Y <lijAx)ibljx) 

t=i i=i 

where Qij^kix) := hij^kix) — dgij/dxk- Note that equation (46) is true for every 
k G [D]. We can write these D equations in matrix form 


(47) 

[ b,,i 

biA 

^'i,Ci ] 

9i,l{x) 

9i,2{x) 

= Q{x) 

1 - 

B B 

1 _ 





. 9i,ciix) 


- KcA - 


where Q{x) is a D x Ci polynomial matrix with entries in ]R[a:]. We can view 
equation (47) as a linear system of equations over the field ]R(a:). Define Bi := 
[ biA biA ■ ■ ■ bi^a- ] The columns of Bi form a basis of SA and so they will be 
linearly independent over K. Consequently, the square matrix BJBi will be invert¬ 
ible over R and its inverse will also be the inverse of BJBi over the larger field ]R(a;). 
Multiplying both sides of equation (47) from the left with {BjBi)~^Bj, we obtain 


(48) 

9^Ax) 

9tAx) 

= iBjB,r^BjQ{x) 

bj^ix 

bj^x 


. 9i, Ci{x) 


- KcA - 
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Note that {BJBJQ{x) G and so equation (48) gives that gij{x) G 

G [ci]. Returning back to equation (42), we readily see that p G X|., Vi G [n], 
which implies that p G 

(<^=) Suppose that p G Since C = X^, we see that 

p must be a vanishing polynomial. Since p G X^., by Proposition 52 we can write 
P{^) = from which it follows that Vp\xi = 0, Va;^ G 5^. 

Since this holds for any i G [n], we get that Vp|cc = 0, Va: G □ 

We conclude with a theorem lying at the heart of Algebraic Subspace Clustering. 

Theorem 58. Let A = IJ^i ® transversal subspace arrangement o/K^ with 
vanishing ideal X^. Let Jj\, be the product ideal Ja = Xs^ ■ ■ ■ Xs^ ■ Then the two ideals 
are equal at degrees £ >n, i.e., Xj^^i = Ja,iA££ > R- 

Theorem 58 implies that every polynomial of degree n that vanishes on a transversal 
subspace arrangement A of n subspaces is a linear combination of products of linear 
forms vanishing on A, a fundamental fact that is used repeatedly in the main text of 
the paper. Theorem 58 was first proved in Proposition 3.4 of [4], in the context of 
the Castelnuovo-Mumford regularity of products of ideals generated by linear forms. 
It was later reproved in [5] using a Hilbert series argument and the result from [6] on 
the Castelnuovo-Mumford regularity of a subspace arrangement. 
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